<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY rfc3711 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3711.xml">
<!ENTITY rfc3903 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3903.xml">
<!ENTITY rfc3261 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3261.xml">
<!ENTITY rfc3830 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3830.xml">
<!ENTITY I-D.ietf-sip-media-security-requirements SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sip-media-security-requirements.xml">
<!ENTITY rfc4568 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4568.xml">
<!ENTITY I-D.ietf-sipping-config-framework SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sipping-config-framework.xml">
<!ENTITY I-D.ietf-sip-sips SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sip-sips.xml">
<!ENTITY I-D.ietf-sip-saml SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-sip-saml.xml">
<!ENTITY I-D.zimmermann-avt-zrtp SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.zimmermann-avt-zrtp.xml">
<!ENTITY rfc4117 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4117.xml">
<!ENTITY rfc4317 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4317.xml">
<!ENTITY rfc2804 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2804.xml">
<!ENTITY rfc3725 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3725.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc iprnotified="yes" ?>
<?rfc strict="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>
<?rfc sortrefs="yes" ?>
<?rfc colonspace='yes' ?>
<?rfc tocindent='yes' ?>
<?rfc rfcprocack="yes"?>
<rfc category="std" docName="draft-wing-sipping-srtp-key-03" ipr="full3978">
  <front>
    <title abbrev="SRTP Recording with SIP">Secure Media Recording and
    Transcoding with the Session Initiation Protocol</title>

    <author fullname="Dan Wing" initials="D." surname="Wing">
      <organization abbrev="Cisco">Cisco Systems, Inc.</organization>

      <address>
        <postal>
          <street>170 West Tasman Drive</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>USA</country>
        </postal>

        <email>dwing@cisco.com</email>
      </address>
    </author>

    <author fullname="Francois Audet" initials="F." surname="Audet">
      <organization abbrev="Nortel">Nortel</organization>

      <address>
        <postal>
          <street>4655 Great America Parkway</street>

          <city>Santa Clara</city>

          <region>CA</region>

          <code>95054</code>

          <country>USA</country>
        </postal>

        <email>audet@nortel.com</email>
      </address>
    </author>

    <author fullname="Steffen Fries" initials="S." surname="Fries">
      <organization>Siemens AG</organization>

      <address>
        <postal>
          <street>Otto-Hahn-Ring 6</street>

          <city>Munich</city>

          <region>Bavaria</region>

          <code>81739</code>

          <country>Germany</country>
        </postal>

        <email>steffen.fries@siemens.com</email>
      </address>
    </author>

    <author fullname="Hannes Tschofenig" initials="H" surname="Tschofenig">
      <organization>Nokia Siemens Networks</organization>

      <address>
        <postal>
          <street>Otto-Hahn-Ring 6</street>

          <city>Munich</city>

          <region>Bavaria</region>

          <code>81739</code>

          <country>Germany</country>
        </postal>

        <email>Hannes.Tschofenig@nsn.com</email>

        <uri>http://www.tschofenig.com</uri>
      </address>
    </author>

    <author fullname="Alan Johnston" initials="A" surname="Johnston">
      <organization>Avaya</organization>

      <address>
        <postal>
          <street></street>

          <city>St. Louis</city>

          <region>MO</region>

          <country>USA</country>
        </postal>

        <email>alan@sipstation.com</email>
      </address>
    </author>

    <date year="2008" />

    <workgroup>SIPPING Working Group</workgroup>

    <abstract>
      <t>Call recording is an important feature in enterprise telephony
      applications. Some industries such as financial traders have
      requirements to record all calls in which customers give trading orders.
      This poses a particular problem for Secure RTP systems as many SRTP key
      exchange mechanisms do not disclose the SRTP session keys to
      intermediate SIP proxies. As a result, these key exchange mechanisms
      cannot be used in environments where call recording is needed.</t>

      <t>This document specifies a secure mechanism for a cooperating endpoint
      to disclose its SRTP master keys to an authorized party to allow secure
      call recording.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>Call recording is an important feature in enterprise telephony
      applications. Some industries such as financial traders have
      requirements to record all calls in which customers give trading orders.
      In others, calls are recorded, as the near ubiquitous announcement says,
      "for training and quality control purposes".</t>

      <t>Note that the services and examples in this document are not
      wiretapping as defined in <xref target="RFC2804">Raven</xref>.
      Specifically, all recording done by enterprises is always announced to
      both parties. Also, in most circumstances, the intent of the recording
      is to protect both parties from later disagreements about what was said
      during the conversation or to remedy mistakes made.</t>

      <t>First, four different recording modes are discussed. Then example
      call flows for how this can be accomplished using standard SIP
      primitives. Finally, the impact of encrypted media, SRTP, is
      discussed.</t>
    </section>

    <section title="Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"></xref> and indicate requirement levels for compliant
      mechanisms.</t>

      <t>The following terminology is taken directly from <xref
      target="RFC3903">SIP Event State Publication Extension</xref>:</t>

      <t><list style="hanging">
          <t hangText="Event Publication Agent (EPA):">The User Agent Client
          (UAC) that issues PUBLISH requests to publish event state.</t>

          <t hangText="Event State Compositor (ESC):">The User Agent Server
          (UAS) that processes PUBLISH requests, and is responsible for
          compositing event state into a complete, composite event state of a
          resource.</t>

          <t hangText="Publication:">The act of an EPA sending a PUBLISH
          request to an ESC to publish event state.</t>
        </list></t>
    </section>

    <section anchor="sec-introduction-to-call-recording"
             title="Introduction to SRTP Call Recording">
      <t>This document addresses two difficulties with End-to-end encryption
      of RTP (<xref target="RFC3711">SRTP</xref>): transcoding and media
      recording. When peering with other networks, different codecs are
      sometimes necessary (e.g., transcoding a surround-sound codec for
      transmission over a highly-compressed bandwidth-constrained network). In
      some environments (e.g., stock brokerages and banks) regulations and
      business needs require recording calls with coworkers or with customers.
      In many environments, quality problems such as echo can only be
      diagnosed by listening to the call (analyzing SRTP headers is not
      sufficient).</t>

      <t>With an RTP stream, transcoding is accomplished by modifying SDP to
      offer a different codec through a transcoding device <xref
      target="RFC4117"></xref>, and call recording or monitoring can be
      accomplished with an Ethernet sniffer listening for SIP and its
      associated RTP, with a media relay, or with a Session Border Controller.
      However, when media is encrypted end-to-end <xref
      target="I-D.ietf-sip-media-security-requirements"></xref>, these
      existing techniques fail because they are unable to decrypt the media
      packets.</t>

      <t>When a media session is encrypted with SRTP, there are three
      techniques to decrypt the media for monitoring or call recording:</t>

      <t><list style="numbers">
          <t>the endpoint establishes a separate media stream to the recording
          device, with a separate SRTP key, and sends the (mixed) media to the
          recording device. This techniques is often called 'active
          recording'. The disadvantages of this technique include doubling
          bandwidth requirements in the network and additionally the
          processing power on the client side. Moreover, the loss of media
          recording facility doesn't cause loss of call (as is required in
          some environments). Depending on the application requirements it may
          be necessary to establish a reliable connection to the recording
          device to cope with possible packet loss on the unreliable link,
          typically used for media transport. Because the endpoint maintains
          its own key with the connected party, this technique is more secure:
          a malicious media recording device cannot inject media to the
          connected party on behalf of the endpoint.</t>

          <t>the endpoint relays media through a device which forks a separate
          media stream to the recording device. This technique is often
          employed by Session Border Controllers. This relay does not, itself,
          have access to the SRTP key.</t>

          <t>Network monitoring devices are used to listen to the SRTP traffic
          and correlate SRTP with SIP. This correlation requires cooperation
          of call signaling devices if the call signaling is encrypted (e.g.,
          with TLS).</t>
        </list></t>

      <t>This document describes cases (2) and (3) where a cooperating
      endpoint publishes its SRTP master keys to an authorized party using the
      <xref target="RFC3903">SIP Event State Publication Extension</xref>. The
      mechanism can be described as passive recording, as the client is not
      directly involved with the media recording. The client merely provides
      the key information to a recording device. The mechanism described in
      this paper allows secure disclosure of SRTP session keys to authorized
      parties so that an endpoints media stream can be transcoded or
      decrypted, as needed by that environment. Technique (1) stated above is
      not considered further in this document, as it does not require the
      disclosure of the key used for the communication between the two
      endpoints.</t>
    </section>

    <section title="Recording Modes">
      <t>There are four common modes of call recording which are described in
      the following sections.</t>

      <section title="Always On Recording">
        <t>In the Always On recording mode, for an identified endpoint, phone
        number, user or agent, all calls both incoming and outgoing are
        recorded. For example, a toll free call to a helpline could utilize
        this mode to record the entire text of calls.</t>
      </section>

      <section title="Recording On Demand">
        <t>In the Recording On Demand recording mode, only certain calls are
        recorded. For example, in a call center application, personal or
        non-call center calls by an agent might not be recorded.</t>
      </section>

      <section title="Required Recording">
        <t>In the Required Recording mode, the requirement for recording is so
        strong that if call recording resources are unavailable, the call must
        not be setup or an existing call must be disconnected.</t>
      </section>

      <section title="Pause and Resume Recording">
        <t>In the Pause and Resume Recording Mode, only parts of a given call
        may be recorded. For example, when the call is placed on hold,
        recording may be paused and resumed when the call is resumed. Or, IVR
        interactions in which a user enters account numbers and pin numbers
        should not be recorded, as the DTMF tones convey private or secure
        information. Pausing can be unidirectional or bi-directional.</t>
      </section>
    </section>

    <section title="Recording Call Flows">
      <t>This section will show how these four recording modes can be
      implemented .</t>

      <t>In SIP call recording, the two-way RTP or SRTP media session between
      two UAs is sent to a UA referred to as a Recording UA. While it is
      possible for recording to be done locally in a UA, this has no impact on
      the SIP call flows.</t>

      <t>While it is also possible for the recording policy and decision
      making to be included in an endpoint, it is more common to have a third
      party control recording and cause the RTP or SRTP to be sent to the
      Recording UA. In these call flows, this third party will be called the
      Controller.</t>

      <t>If the Controller acts as a third party call controller <xref
      target="RFC3725">(3PCC)</xref>, it is possible for the Controller to
      cause each UA to send an extra media stream to the Recorder. However,
      for this call flow to work:</t>

      <t><list style="numbers">
          <t>Both UAs must support multiple media lines and streams sent to
          different addresses (e.g., Section 2.4 of <xref target="RFC4317">SDP
          Examples</xref>).</t>

          <t>Both UAs must have twice the normal bandwidth available.</t>

          <t>Both UAs must know to send the same media on both media
          streams.</t>
        </list></t>

      <t>While 1 and 2 are possible, 3 is the most difficult. Without
      additional information in the SDP, each media stream is considered a
      separate media stream.</t>

      <t>Alternatively, the Controller could be a combination of a SIP Proxy
      and a media relay (e.g., a Session Border Controller). This media relay
      would copy media streams to a second location. The protocol and
      coordination between these two elements is outside the scope of this
      specification. In another model discussed in Section 5, the Controller
      could be a SIP Focus and a Media Server with some special logic.
      Finally, the Controller could be realized as a B2BUA.</t>

      <t>Using this model, there are no SIP, SDP, or bandwidth requirements on
      either UA. The Controller then can cause the media received at the Media
      Relay to be copied to the Recorder. An example is shown in <xref
      target="fig-controller"></xref>, below where the Recorder records a call
      between Alice and Bob.</t>

      <figure anchor="fig-controller" title="Controller Proxy or B2BUA">
        <artwork align="center"><![CDATA[
Alice          Controller        Bob               Recorder
  |                |              |                   |
  |      INVITE F1 |              |                   |
  |--------------->|              |                   |
  |(100 Trying) F2 |              |                   |
  |<---------------|   INVITE F3  |                   |
  |                |--------------------------------->|
  |                |              |    200 OK F4      |
  |                |<---------------------------------|
  |                |              |      ACK F5       |
  |                |--------------------------------->|
  |                |   INVITE F6  |                   |
  |                |------------->|                   |
  |                |180 Ringing F7|                   |
  |                |<-------------|                   |
  | 180 Ringing F5 |              |                   |
  |<---------------|  200 OK F6   |                   |
  |                |<-------------|                   |
  |    200 OK F7   |              |                   |
  |<---------------|              |                   |
  |     ACK F8     |              |                   |
  |--------------->|     ACK F9   |                   |
  |                |------------->|                   |
  |                |   INVITE F10 |                   |
  |                |--------------------------------->|
  |                |              |    200 OK F11     |
  |                |<---------------------------------|
  |                |              |      ACK F12      |
  |                |--------------------------------->|
  |    Both way SRTP Established  |                   |
  |<==============>|<============>|                   |
  |                |  SRTP From Alice                 |
  |                |=================================>|
  |                |  SRTP From Bob                   |
  |                |=================================>|
]]></artwork>
      </figure>

      <t>The following sections will discuss and extend this basic call flow
      for the four recording modes.</t>

      <section title="Always On Recording">
        <t>The Always On recording mode for the user Bob can be implemented
        using the call flow of <xref target="fig-controller"></xref> if every
        call made to Bob is handled in this way.</t>
      </section>

      <section title="Recording On Demand">
        <t>In the Recording On Demand recording mode, the call flow of <xref
        target="fig-controller"></xref> is used selectively - only for the
        calls that need to be recorded. For the non-recorded flows, the
        Controller could act as a Proxy Server and make no changes to the
        signaling or media flows. By not inserting a Record-Route, the
        Controller could even drop out of the SIP dialog for calls where
        recording is not of interest.</t>
      </section>

      <section title="Required Recording">
        <t>Required recording could also be implemented using <xref
        target="fig-controller"></xref>, as the INVITE is sent first to the
        Recorder before being sent to Bob. As a result, if the INVITE is
        refused (i.e., the Recorder is unable to record the call), the INVITE
        will not be forwarded to Bob and the call refused. Also, if the
        Recorder disconnects during the call or is unable to provide recording
        resources (i.e., disks full, etc.), the BYE from the Recorder can be
        used to terminate the call to Bob. This is show in <xref
        target="fig-required-recording"></xref>, below.</t>

        <figure anchor="fig-required-recording"
                title="Required Recording Call Flow">
          <artwork align="center"><![CDATA[
Alice          Controller        Bob               Recorder
  |                |              |                   |
  |    Both way SRTP Established  |                   |
  |<==============>|<============>|                   |
  |                |  SRTP From Alice                 |
  |                |=================================>|
  |                |  SRTP From Bob                   |
  |                |=================================>|
  |                |              |                   |
  |                |            BYE F1                |
  |                |<---------------------------------|
  |                |          200 OK  F2              |
  |                |--------------------------------->|
  |                |              |                   |
  |     BYE F3     |              |                   |
  |<---------------|              |                   |
  |    200 OK F4   |              |                   |
  |--------------->|              |                   |
]]></artwork>
        </figure>
      </section>

      <section title="Pause and Resume Recording Call Flow">
        <t>The Pause and Resume recording mode can be initiated by the call
        flow of Figure 2. When the recording is to be paused, for example,
        when the caller Alice places the call on hold, the hold re-INVITE from
        Alice causes the Controller to place the call to the Recorder on hold
        as well. No media is sent to the Recorder until a re-INVITE starts the
        recording again, as shown in <xref target="fig-pause-resume"></xref>,
        below.</t>

        <figure anchor="fig-pause-resume" title="Pause and Resume Call Flow">
          <artwork align="center"><![CDATA[
Alice          Controller        Bob               Recorder
  |                |              |                   |
  |    Both way SRTP Established  |                   |
  |<==============>|<=============>|                  |
  |                |  SRTP From Alice                 |
  |                |=================================>|
  |                |  SRTP From Bob                   |
  |                |=================================>|
  | INVITE (hold) F1              |                   |
  |--------------->|   INVITE (inactive) F2           |
  |                |--------------------------------->|
  |                |      200 OK (inactive) F4        |
  |                |<---------------------------------|
  |                |              |      ACK F5       |
  |                |--------------------------------->|
  |                |INVITE (hold) F6                  |
  |                |------------->|                   |
  |                |200 OK (hold) F7                  |
  |                |<-------------|                   |
  | 200 OK (hold) F8              |                   |
  |<---------------|              |                   |
  |     ACK F8     |              |                   |
  |--------------->|     ACK F9   |                   |
  |                |------------->|                   |
  |                |              |                   |
  |                    No SRTP Sent                   |
]]></artwork>
        </figure>
      </section>

      <section title="Conference Recording">
        <t>A call flow for conference recording is shown in <xref
        target="fig-alternative"></xref>, below. This call flow is similar to
        the previous ones except with a focus instead of the Controller. The
        recorder SUBSCRIBEs to the focus using the conference event package to
        learn of call recording events of interest to the Recorder.</t>

        <t>With the subscription established by the SUBSCRIBE, the Recorder
        receives NOTIFYs whenever recording events of interest occur from the
        Controller. For example, the Recorder is informed when Alice joins the
        conference, but recording is not initiated. When notification that Bob
        has joined the conference is received in a NOTIFY, F7, is sent. In
        this example, the Recorder decides to record the call and sends a
        INVITE with Join to the Controller, F16. The dialog information used
        to construct the Join header field is obtained using the NOTIFY, F13.
        The Focus/Mixer then begins to stream the media to the Recorder for
        the duration of the conference.</t>

        <t>This model could be used for other recording modes. In this case,
        the event package would be a new event package specifically tailored
        to the recording application, containing all the information needed by
        a Recorder to make a decision on whether or not to record a call. The
        details of this event package may be defined in a future draft. Note
        that presently, CTI (Computer Telephone Integration) protocols are
        used for this purpose today.</t>

        <figure anchor="fig-alternative"
                title="Conference Recording Call Flow">
          <artwork align="center"><![CDATA[
Alice         Focus/Mixer        Bob               Recorder
  |                |              |                   |
  |                |   SUBSCRIBE F1                   |
  |                |<---------------------------------|
  |                |              |    200 OK F2      |
  |                |--------------------------------->|
  |                |   NOTIFY F3  |                   |
  |                |--------------------------------->|
  |                |              |    200 OK F4      |
  |                |<---------------------------------|
  |      INVITE F5 |              |                   |
  |--------------->|              |                   |
  |    200 OK F6   |              |                   |
  |<---------------|              |                   |
  |     ACK F7     |              |                   |
  |--------------->|              |                   |
  |     SRTP       |   NOTIFY F8  |                   |
  |<==============>|--------------------------------->|
  |                |              |    200 OK F9      |
  |                |<---------------------------------|
  |                |  INVITE F10  |                   |
  |                |<-------------|                   |
  |                |180 Ringing F11                   |
  |                |------------->|                   |
  |                |  200 OK F12  |                   |
  |                |------------->|                   |
  |                |     SRTP     |                   |
  |                |<============>|                   |
  |                |   NOTIFY F13 |                   |
  |                |--------------------------------->|
  |                |              |    200 OK F14     |
  |                |<---------------------------------|
  |                |   INVITE Join: A-B F15           |
  |                |<---------------------------------|
  |                |              |    200 OK F16     |
  |                |--------------------------------->|
  |                |              |      ACK F17      |
  |                |<---------------------------------|
  |                | Mixed SRTP from Alice and Bob    |
  |                |=================================>|
]]></artwork>
        </figure>
      </section>
    </section>

    <section title="Transcoding">
      <t>There are similarities between transcoding and call recording,
      especially technique 2 described in <xref
      target="sec-introduction-to-call-recording"></xref>. An endpoint that
      desires transcoding can provide its SRTP key to a transcoder and request
      its services.</t>

      <t>[[This section is a placeholder, and will be expanded in a later
      version of this document.]]</t>
    </section>

    <section title="Media Considerations">
      <t>The following sections will discuss considerations relating to the
      media streams.</t>

      <section title="Offer/Answer Considerations">
        <t>For the call flows in this document, it is assumed that a single
        bi-directional media stream is to be recorded. Normally, this would be
        negotiated using a single media line (m= line) in the SDP with a
        default direction attribute (a=sendrcv). The media stream sent from
        the Controller to the Recorder could be done in two different ways,
        depending on the media handling in the Controller. In the simplest
        case, each direction of the media stream between Alice and Bob could
        be converted to a separate uni-directional media stream sent to the
        Controller. In the INVITE from the Controller to the Recorder, for a
        single recording session, there would be two media lines (m=) with
        each marked as send only (a=sendonly). This has the advantage that the
        Controller does not have to perform any processing on the RTP packets
        - they are simply forwarded without changing SSRC or sequence numbers.
        The Recording device will then mix the packets together or possibly
        record the two sides of the conversation separately, if desired.</t>

        <t>In the other model, the Controller can function as an RTP mixer, in
        which case a single uni-directional media stream will be used with a
        single media line. The Controller will need to process the RTP packets
        by mixing them and including its own SSRC and sequence number in the
        resulting RTP packets. The Recorder will then not have to mix them and
        will not have the option of recording the two sides separately.</t>

        <t>The approach of using two separate media lines is the recommended
        one as it allows for simple RTP packet processing at the Controller
        and also provides recording flexibility at the Recorder. However, a
        Recorder should also be able to handle the case where the Controller
        performs the mixing as well.</t>
      </section>

      <section title="Operation">
        <t>For transcoding, RTP packets must be sent from and received by a
        device which performs the transcoding. When the media is encrypted,
        this device must be capable of decrypting the media, performing the
        transcoding function, and re-encrypting the media.</t>

        <t><list>
            <t>ISSUE-1: should we consider providing some or all of the SIP
            headers, as well? Some recording functions will need to know the
            identity of the remote party. This information could be gleaned
            from the SIP proxies, though, and starts to fall outside the
            intended scope of this document.</t>

            <t>ISSUE-2: The authors have been considering use of <xref
            target="RFC3830">MIKEY</xref>, but MIKEY may not be used off the
            shelf. Certain changes to the state machine may have to be made
            (<xref target="RFC3830">MIKEY</xref> describes the TGK transport
            rather than SRTP master key transport).</t>
          </list></t>

        <section title="Learning Name and Certificate of ESC">
          <t>The endpoint will be configured with the AOR of its ESC (e.g.,
          "transcoder@example.com"). If S/MIME is used to send the SRTP master
          key to the ESC, the endpoint is additionally configured with the
          certificate of its ESC.</t>

          <t>The name and public key of the ESC is configured into the
          endpoint. It is vital that the public key of the ESC is not changed
          by an unauthorized user. Changes to change that public key will
          cause SRTP key disclosure to be encrypted with that key. It is
          RECOMMENDED that endpoints restrict changing the public key of the
          disclosure device using protections similar to changes to the
          endpoint's SIP username and SIP password.</t>
        </section>

        <section title="Authorization of ESC">
          <t>Depending on the application, authorization of the key disclosure
          and distribution to the ESC may be necessary besides the pure
          transport security of the key distribution itself. This may be the
          case when the <xref
          target="I-D.ietf-sipping-config-framework">configuration
          framework</xref> is not applied and thus the information about the
          ESC is not known to the client.</t>

          <t>This can be done by providing a <xref
          target="I-D.ietf-sip-saml">SAML extension</xref> in the header of
          the SUBSCRIBE message. The SAML assertion shall at least contain the
          information about the ESC, call related information to associate the
          call with the assertion (editors note: we may also define wildcards
          here to allow for recordings of all phone calls for a day,
          independent of the call) and a reference to the certificate for the
          ESC. The latter information is needed to transport the SRTP Session
          Key to the ESC in a protected manner, as described in the section
          below.</t>

          <t>The signature of the SAML assertion should be produced using the
          private key of the domain certificate. This certificate MUST have a
          SubjAltName which matches the domain of user agent's SIP proxy (that
          is, if the SIP proxy is sip.example.com, the SubjAltName of the
          domain certificate signing this SAML assertion MUST also be
          example.com). Here, the main focus is placed on communication of
          clients with the ESC, which belongs to the client's home domain.</t>
        </section>

        <section title="Sending SRTP Session Keys to ESC">
          <t>SDP is used to describe the media session to the ESC. However,
          the existing <xref target="RFC4568">Security Descriptions</xref>
          only describes the master key and parameters of the SRTP packets
          being sent -- it does not describe the master key (and parameters)
          of the SRTP being received, or the SSRC being transmitted. For
          transcoding and media recording, both the sending key and receiving
          key are needed and in some cases the SSRC is needed.</t>

          <t>Thus, we hereby extend the existing crypto attribute to indicate
          the SSRC. We also create a new SDP attribute, "rcrypto", which is
          identical to the existing "crypto" attribute, except that it
          describes the receiving keys and their SSRCs. For example:</t>

          <figure anchor="sdp_example" title="Example SDP">
            <artwork><![CDATA[
  a=crypto:1 AES_CM_128_HMAC_SHA1_80
    inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
    SSRC=1899
  a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
    inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKKIN1Mw|2^20|1:32
    SSRC=3289
  a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
    inline:Hw3JFWNCFqSpTqNiYRj6HmSWKMHAmO4q1KIN1OVA|2^20|1:32
    SSRC=4893
]]></artwork>

            <postamble></postamble>
          </figure>

          <t>The full SDP, including the keying information, is then sent to
          the ESC. The keying information MUST be encrypted and integrity
          protected. Existing mechanisms such as <xref
          target="RFC3261">S/MIME</xref> and <xref
          target="I-D.ietf-sip-sips">SIPS</xref> or SIP over TLS (on all hops
          per administrative means) MAY be used to achieve this goal, or other
          mechanisms may be defined.</t>

          <t><list style="hanging">
              <t hangText="[[">ISSUE-3: if a endpoint is receiving multiple
              incoming streams from multiple endpoints, it will have
              negotiated different keys with each of them, and all of that
              traffic is coming to the same transport address on the endpoint.
              Thus, we need a way to describe the different keys we're using
              to/from different transport addresses. One solution is to
              indicate the remote transport address. Indicating the remote
              SSRC is insufficient for this task, as several SRTP keying
              mechanisms do not include SSRC in their signaling (DTLS-SRTP,
              ZRTP, Security Descriptions). <vspace blankLines="1" />For
              example, if there were two remote peers with different keys, we
              could signal it like this:<figure anchor="Issue_example_SDP"
                  title="Strawman solution">
                  <preamble></preamble>

                  <artwork><![CDATA[    a=crypto:1 AES_CM_128_HMAC_SHA1_80
      inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
      192.0.2.1:5678 SSRC=1899 SSRC=3892
    a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
      inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKKIN1Mw|2^20|1:32
      192.0.2.1:5678 SSRC=3289 SSRC=2813
    a=crypto:1 AES_CM_128_HMAC_SHA1_80
      inline:GdUJShpX1ZLEw6UzF3WSJjNzB4d1BINUAv+PSdFc|2^20|1:32
      192.0.2.222:2893
    a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
      inline:6UzF3IN1ZLEwAv+PSdFcWUGdUJShpXSJjNzB4d1B|2^20|1:32
      192.0.2.222:2893]]></artwork>

                  <postamble></postamble>
                </figure></t>

              <t hangText="]]"></t>
            </list></t>
        </section>

        <section title="Scenarios and Call Flows">
          <t>The following scenarios and call flows depict the assumptions for
          the provision of media key disclosure. <xref
          target="topology"></xref> shows the general setup within the home
          domain of the client. Note that the authors assume that the client
          only discloses media keys only to an entity in the client's home
          network rather than to an arbitrary entity in the visited
          network.</t>

          <figure anchor="topology" title="Network Topology">
            <artwork><![CDATA[
 +----------+ +-------+ +---------+ +--------+ +----------+
 | SIP User | |  SIP  | |SIP Proxy| | Media  | |   SIP    |
 |Agent(EPA)| | Proxy | |  (ESC)  | |Recorder| |User Agent|
 +----------+ +-------+ +---------+ +--------+ +----------+
       |          |          |           |           |
       +----------+----------+-----------+-----------+]]></artwork>
          </figure>

          <t>Based on this setup there are different options to realize the
          key disclosure, depending on the environment. In the following two
          approaches are distinguished.</t>

          <t><list style="hanging">
              <t hangText="Publishing media keys to the ESC"><vspace
              blankLines="1" /> This requires that the configuration
              management provides the ESC configuration data (e.g.,
              certificate, policy) in a secure way to the client. As stated
              above, this configuration is outside the scope of this document,
              but an example can be found in <xref
              target="I-D.ietf-sipping-config-framework"></xref>. The key
              disclosure in this approach uses the PUBLISH method to disclose
              the key to the ESC according to a given policy. <vspace
              blankLines="1" /> <figure anchor="fig-message-flow-publishing"
                  title="Message Flow showing Publishing of Media Keys to ESC">
                  <artwork><![CDATA[
 +----------+ +-------+ +---------+ +--------+ +----------+
 | SIP User | |  SIP  | |SIP Proxy| | Media  | |   SIP    |
 |Agent(EPA)| | Proxy | |  (ESC)  | |Recorder| |User Agent|
 +----------+ +-------+ +---------+ +--------+ +----------+
      |           |           |          |          |
      |-REGISTER->|           |          |          |
      |<-200 OK---|           |          |          |
      |           |           |          |          |
      |--INVITE-->|-------------INVITE------------->|
      |<-200 Ok---|<------------200 Ok------------- |
      |           |           |          |          |
      |<====SRTP in both directions================>|
      |           |           |          |          |
      |-PUBLISH-->|-PUBLISH-->|-key----->|          |
      |<-200 Ok---|<--200 Ok--|          |          |
      ]]></artwork>
                </figure> <vspace blankLines="1" />Note that the protocol
              between the ESC and the recorder is out of scope of this
              document.</t>

              <t hangText="Using SAML assertions for ESC contact"><vspace
              blankLines="1" /> In this approach authorization is provided via
              a SAML assertion, see <xref target="I-D.ietf-sip-saml"></xref>,
              indicating which ESC is allowed to perform call recording of a
              single or a set of calls, depending on the content of the
              assertion. Here a SAML assertion is provided as part of the
              SUBSCRIBE message, send from the ESC to the client. The
              assertion needs to provide at least the call relation, or a time
              interval for which media recoding is going to be performed. The
              SAML assertion is signed with the private key associated with
              the domain certificate, which is in possession of the
              authentication service. The call flow would look like following:
              <vspace blankLines="1" /> <figure anchor="fig-publish-saml"
                  title="Message Flow Showing Publication using SAML">
                  <artwork><![CDATA[
 +----------+ +-------+ +---------+ +--------+ +----------+
 | SIP User | |  SIP  | |SIP Proxy| | Media  | |   SIP    |
 |Agent(EPA)| | Proxy | |  (ESC)  | |Recorder| |User Agent|
 +----------+ +-------+ +---------+ +--------+ +----------+
      |           |           |          |          |
      |-REGISTER->|           |          |          |
      |<-200 OK---|           |          |          |
      |           |           |          |          |
      |<-SUBSCRIBE (SAML as.)-|          |          |
      |           |           |          |          |
      |--INVITE-->|-------------INVITE------------->|
      |<-200 Ok---|<------------200 Ok------------- |
      |           |           |          |          |
      |<====SRTP in both directions================>|
      |           |           |          |          |
      |--NOTIFY (SRTP data)-->|          |          |
      |           |           |          |          |
      ]]></artwork>
                </figure></t>
            </list></t>
        </section>
      </section>
    </section>

    <section title="Grammar">
      <t>[[Grammar will be provided in a subsequent version of this
      document.]]</t>
    </section>

    <section title="Security Considerations">
      <t></t>

      <section title="Incorrect ESC">
        <t>Insertion of the incorrect public key of the SRTP ESC will result
        in disclosure of the SRTP session key to an unauthorized party. Thus,
        the UA's configuration MUST be protected to prevent such
        misconfiguration. To avoid changes to the configuration in the end
        device, the configuration access MUST be suitably protected.</t>
      </section>

      <section anchor="disclosing_srtp_session_key"
               title="Risks of Sharing SRTP Session Key">
        <t>A party authorized to obtain the SRTP session key can listen to the
        media stream and could inject data into the media stream as if it were
        either party. The alternatives are worse: disclose the device's
        private key to the transcoder or media recording device, or abandon
        using secure SRTP key exchange in environments that require media
        transcoding or media recording. As we wish to promote the use of
        secure SRTP key exchange mechanisms, disclosure of the SRTP session
        key appears the least of these evils.</t>
      </section>

      <section title="Disclosure of Call Recording">
        <t>Secure SRTP key exchange techniques which implement this
        specification SHOULD provide a "disclosure flag", similar to that
        first proposed in Appendix B of <xref
        target="I-D.zimmermann-avt-zrtp"></xref>. In this way, both endpoints
        can be made aware of such recording and provide appropriate alerting
        to their users (via an audible, visual, or other indicator).</t>
      </section>

      <section title="Integrity and encryption of keying information">
        <t>The mechanism describe in this specification relies on protecting
        and encrypting the keying information. There are well known mechanism
        to achieve that goal.</t>

        <t>Using SIPS to convey the SRTP key exposes the SRTP master key to
        all SIP proxies between the Event Publication Agent (ESC, the SIP User
        Agent) and the Event State Compositor (ESC). S/MIME allows disclosing
        the SRTP master key to only the ESC.</t>
      </section>
    </section>

    <section title="IANA Considerations">
      <t>New SSRC extension of the "crypto" attribute, and the new "rcrypto"
      attribute will be registered here.</t>
    </section>

    <section title="Examples">
      <figure anchor="sips_example" title="Example with &quot;SIPS:&quot; AOR">
        <preamble>This is an example showing a SIPS AOR for the ESC. This
        relies on the SIP network providing TLS encryption of the SRTP master
        keys to the ESC.</preamble>

        <artwork><![CDATA[
  PUBLISH sips:recorder@example.com SIP/2.0
  Via: SIP/2.0/TLS pua.example.com;branch=z9hG4bK652hsge
  To: <sips:recorder@example.com>
  From: <sips:dan@example.com>;tag=1234wxyz
  Call-ID: 81818181@pua.example.com
  CSeq: 1 PUBLISH
  Max-Forwards: 70
  Expires: 3600
  Event: srtp
  Content-Type: application/sdp
  Content-Length: ...

  v=0
  o=alice 2890844526 2890844526 IN IP4 client.atlanta.example.com
  s=-
  c=IN IP4 192.0.2.101
  t=0 0
  m=audio 49172 RTP/SAVP 0
  a=crypto:1 AES_CM_128_HMAC_SHA1_80
    inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32
  a=rcrypto:1 AES_CM_128_HMAC_SHA1_80
    inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKI8K1Mw|2^20|1:32
  a=rtpmap:0 PCMU/8000

]]></artwork>

        <postamble></postamble>
      </figure>

      <figure anchor="s_mime_example"
              title="Example with S/MIME-encrypted SDP">
        <preamble>This is an example showing an S/MIME-encrypted transmission
        to the recorder's AOR, recorder@example.com. The data enclosed in "*"
        is encrypted with recorder@example.com's public key.</preamble>

        <artwork><![CDATA[
  PUBLISH sip:recorder@example.com SIP/2.0
  Via: SIP/2.0/UDP pua.example.com;branch=z9hG4bK652hsge
  To: <sip:recorder@example.com>
  From: <sip:dan@example.com>;tag=1234wxyz
  Call-ID: 81818181@pua.example.com
  CSeq: 1 PUBLISH
  Max-Forwards: 70
  Expires: 3600
  Event: srtp
  Content-Type: application/pkcs7-mime;smime-type=enveloped-data;
                name=smime.p7m
  Content-Transfer-Encoding: binary
  Content-ID: 1234@atlanta.example.com
  Content-Disposition: attachment;filename=smime.p7m;
                       handling=required
  Content-Length: ...

   ******************************************************************
   * (encryptedContentInfo)                                         *
   * Content-Type: application/sdp                                  *
   * Content-Length: ...                                            *
   *                                                                *
   * v=0                                                            *
   * o=alice 2890844526 2890844526 IN IP4 client.atlanta.example.com*
   * s=-                                                            *
   * c=IN IP4 192.0.2.101                                           *
   * t=0 0                                                          *
   * m=audio 49172 RTP/SAVP 0                                       *
   * a=crypto:1 AES_CM_128_HMAC_SHA1_80                             *
   *   inline:NzB4d1BINUAvLEw6UzF3WSJ+PSdFcGdUJShpX1Zj|2^20|1:32    *
   * a=rcrypto:1 AES_CM_128_HMAC_SHA1_80                            *
   *   inline:AmO4q1OVAHNiYRj6HmS3JFWNCFqSpTqHWKI8K1Mw|2^20|1:32    *
   * a=rtpmap:0 PCMU/8000                                           *
   *                                                                *
   ******************************************************************]]></artwork>

        <postamble></postamble>
      </figure>

      <t></t>
    </section>

    <section title="Acknowledgements">
      <t>Thanks to Sheldon Davis and Val Matula for suggesting improvements to
      the document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      &rfc2119;

      &rfc3711;

      &rfc3903;

      &rfc3261;
    </references>

    <references title="Informational References">
      &rfc3830;

      &I-D.ietf-sip-media-security-requirements;

      &rfc4568;

      &I-D.ietf-sipping-config-framework;

      &I-D.ietf-sip-sips;

      &I-D.ietf-sip-saml;

      &I-D.zimmermann-avt-zrtp;

      &rfc4117;

      &rfc4317;

      &rfc2804;

      &rfc3725;
    </references>

    <section title="Outstanding Issues">
      <t>Authors' to-do list:<list style="symbols">
          <t>Separate B2BUA function from media relay function in the call
          flows and in the text.</t>
        </list></t>
    </section>
  </back>
</rfc>