Session Initiation Protocol (SIP) Recording MetadataCisco Systems, Inc.Cessna Business Park,Kadabeesanahalli Village, Varthur Hobli,Sarjapur-Marathahalli Outer Ring RoadBangaloreKarnataka560103Indiarmohanr@cisco.comCisco Systems, Inc.Cessna Business Park,Kadabeesanahalli Village, Varthur Hobli,Sarjapur-Marathahalli Outer Ring RoadBangaloreKarnataka560103Indiapartr@cisco.comCisco Systems, Inc.1414 Massachusetts AvenueBoxborough, MA01719USApkyzivat@cisco.com
Transport
SIPREC
Session recording is a critical requirement in many communications environments such as call centers and financial trading. In some of these environments, all calls must be recorded for regulatory, compliance, and consumer protection reasons. Recording of a session is typically performed by sending a copy of a media stream to a recording device. This document describes the metadata model as viewed by Session Recording Server(SRS) and the Recording metadata format.
Session recording is a critical requirement in many communications environments such as call centers and financial trading. In some of these environments, all calls must be recorded for regulatory, compliance, and consumer protection reasons. Recording of a session is typically performed by sending a copy of a media stream to a recording device. This document focuses on the Recording metadata which describes the communication session. The document describes a metadata model as viewed by Session Recording Server and the Recording metadata format, the requirements for which are described in and the architecture for which is described in .
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in . This
document only uses these key words when referencing normative
statements in existing RFCs."
Metadata element: A metadata element represent one block/class of metadata model.Metadata attributes: Metadata attributes represents the attributes listed in each of the blocks of metadata modelMetadata Composition: Composition represents owns/holds relationship to show Metadata elements contained in another Metadata elementMetadata Associations: Metadata associations represents the associations between different Metadata elements in the model. It uses UML notation.XML element: An XML element represent one XML schema complexType element (xs:complexType) of XML schema XML attributes: An XML attribute represent one XML schema element (xs:element) of XML schema
Metadata is the information that describes recorded media and the CS to which they relate. Below diagram shows a model for Metadata as viewed by Session Recording Server (SRS).
Session Recording Client (SRC) MAY initiate the Recording Session.
Here, Recording Session is a completely independent from the Communication Session that is being recorded at both the SIP dialog level and at the session level.
The metadata MUST be conveyed from SRC to SRS. The metadata MUST be conveyed within the Recording Session Dialog.
Note that the metadata model captures changes that occur over the duration of the recording session. For example, if the call is transferred from one participant to another, then the SRC MUST convey a change of participant and the properties of the new media stream to the SRS. Some of the metadata is not required to be conveyed explicitly from the SRC to the SRS, if it can be obtained contextually by the SRS. For instance, the timing of RS block changes(like Start / Stop time) may not be explicitly conveyed from the SRC to the SRS (The Date header in RS dialog SIP message MAY provide the timing, but it is optional). In such cases the time a change occurred may be assumed to be the same as the time when notification of the change is received by the SRS.
This section gives an overview of Recording Metadata Format. The media related details of metadata MUST be passed across using session description protocol (SDP) . SDP attributes describes about different media formats like audio, video. The other metadata attributes like participant details MUST be passed across in new Recording specific XML document namely application/rs-metadata+xml. The linkage between application/rs-metadata+xml XML schema and metadata SDP is done using the SDP label attribute (a=label:xxx) referenced in .
Metadata is passed across in Recording Session(RS) incrementally whenever there is a change in CS.
Recording Metadata document is an XML document which will be embedded as a message body. recording element MUST present in all recording metadata XML document. recording acts as container for all other elements in this XML document.
Recording object is a XML document. It MUST have the XML declaration and it SHOULD contain an encoding declaration in the XML declaration, e.g., ". ]]> If the charset parameter of the MIME content type declaration is present and it is different from the encoding declaration, the charset parameter takes precedence.
Every application conforming to this specification MUST accept the UTF-8 character encoding to ensure the minimal interoperability.
Syntax and semantics error in recording XML document has to be informed to the originator using application specific mechanism.
The namespace URI for elements defined by this specification is a Uniform Resource Namespace (URN) , using the namespace identifier 'ietf' defined by and extended by .
The URN is as follows: urn:ietf:params:xml:ns:recording
recording element MUST contain an xmlns namespace attribute with value as urn:ietf:params:xml:ns:siprec. One recording element MUST present in the all recording metadata XML document.
dataMode element shows whether the XML document is complete document or partial update. The default value is complete.
This section describes each element of the metadata model, and the attributes of each element. This section also describes how different elements are associated and the XML element for each of them.A Recording Session element represents a SIP session created between an SRC and SRS for the purpose of recording a Communication Session. This element is represented by a SIP RS dialog and hence there is no need for this element to be reflected in metadata XML. A Recording Session element MAY have attributes like: Start/End Time - Start and End time value MUST be derived from Date header(if present in SIP message) in RS. In cases where Date header is not present, Start/End time MAY be set to the time at which SRS receives the notification of SIP message to setup RS / disconnect RS. One instance of Recording Session MUST have:Zero or more instances of Communication Session Group. CSG may be zero because it is optional metadata block. Also the allowance of zero instances is to accommodate persistent recording, where there may be none.Zero or more instances of Communication Session blocks.Each CS Group MUST be associated with one or more Recording Sessions [Here each RS can be setup by the potentially different SRCs.]
A Communication Session Group provides association or linking of Communication Sessions.
A CS Group MUST have a Unique-ID attribute. This Unique-ID is to group different CSs that are related. SRC (or MAY be SRS) MUST ensure the uniqueness of Unique-ID in case multiple SRC interacts with the same SRS. The mechanism by which SRC groups the CS is outside the scope of SIPREC. A communication Session Group MUST be associated with RS and CS in the following manner: There can be one or more Recording Session elements per Communication Session Group. Each Communication Session Group MUST be associated with one or more RS [Here each RS can be setup by the potentially different SRCs] There MAY be one or more Communication Sessions per CS Group [e.g. Consult Transfer]Each CS MAY be associated to zero or one CS-Group
Group element is an optional element provides the information about the communication session group
Each communication session group (CSG) is represented using one group element. Each group element has unique URN UUID attribute which helps to uniquely identify CSG.
A Communication Session block/element in the metadata model represents Communication Session and its properties needed as seen by SRC. A communication Session block MUST have the following attributes:Termination Reason - This represents the reason why a CS was terminated. The communication session MAY contain a Call Termination Reason. This MAY be derived from SIP Reason header of CS.CS Identifier - This attribute is used to uniquely identify a CS.Start Time - This optional attribute represents CS start timeEnd Time - This optional attribute represents CS end timeAttributes like Retention (represent the value/duration for which Media streams of the CS needs to be retained), Force Deletion, Access Information e.t.c that are primarily related to policy will not be passed in metadata from SRC to SRS. However if there are implementations where SRC has enough information, this could be sent as Extension Data attached to CS
A Communication Session MUST be associated to CS-Group, Participant, Media Stream and Recording Session blocks. Cardinalities between CS and Participant allows:
CS to have atleast two or more participantsParticipant MUST be associated with one or more CS’s. This may even includes participants who are not directly part of any CS. An example of such a case is participants in a premixed media stream. The SRC may have knowledge of such Participants, yet not have any signaling relationship with them. This might arise if one participant in CS is a conf focus. Another use case is if one UA in CS works in 3pcc mode to acquire an MoH media stream, this might be reflected as unique source for media stream without having a reported signaling relationship to it. In all these cases if SRC can learn enough information about the Participant, they MUST be associated with CS.The model also allows participants in CS that are not participants in the media. An example is the identity of a 3pcc controller that has initiated a CS to two or more participants of the CS. Another example is the identity of a conference focus. Of course a focus is probably in the media, but since it may only be there as a mixer, it may not report itself as a participant in any of the media streams.Cardinalities between CS and Media Stream allows:A CS to have zero or more Streams
A stream can be associated with 1 or more CS. An example is multicast MoH stream which might be associated with many CSs. Also if we were to consider a B2BUA to have a separate CS on each "side" then they might share a stream.(Though more likely this would be treated as a single CS.)
Cardinalities between CS and RS allows:One instance of RS MUST have Zero or more instances of Communication Session blocks.Each CS MUST be associated with one more RS [ Here each RS can be potentially setup by different SRCs]
Session element provides the information about the communication session
Each communication session(CS) has one session element. Each session element has unique URN UUID attribute which helps to uniquely identify CS.
Reason element MAY be included to indicate the reason for termination. group-ref element MAY exist to indicate the group where the mentioned session belongs.
A Participant block has information about a device that is part of a CS and/or contributes/consumes media stream(s) belonging to a CS. Participant has attributes like:AoR list - Has list of AoRs. An AoR MAY be SIP/SIPS/TEL URI. There MAY be cases where a participant can have more than one AoR [ e.g. P-Asserted-ID which can have both SIP and TEL URIs]Name - This attribute represents Participant name(SIP display name) or DN number ( in case it is known)Other attributes [ like Participant Role, Participant type ] MAY be carried as part of extension data to Participant from SRC to SRS.Cardinalities between participant and Media Stream allows: Participant to receives zero or more media streams Participant to send zero or more media streams. (Same participant provides multiple streams e.g. audio and video) Media stream to be received by zero or more participants. Its possible, though perhaps unlikely, that a stream is generated but sent only to the SRC and SRS, not to any participant. E.g. In conferencing where all participants are on hold and the SRC is collocated with the focus. Also a media stream may be received by multiple participants (e.g. Whisper calls, side conversations). Media stream to be sent by one or more participants (pre-mixed streams).
Example of a case where a participant may receive Zero or more streams - a Supervisor may have side conversation with Agent, while Agent converses with customer.
Participant element provides information regarding the specific participant involved in the recording
There MUST be atleast 2 participant for any given session. "send" or "receive" element in each participant is associating SDP m-lines with the participant. send element indicates that participant is sending the stream of media with the mentioned media description. recv element indicates that participant is receiving the stream and by default all participant will receive the stream. recv element has relevance in case whisper call scenario wherein few of the participant in the session receives the stream and not others.
Participant MUST have AOR element which contains SIP/SIPS URI to identify the participant. AOR element is SIP/SIPS URI FQDN or IP address which represents the user. name is an optional element to represent display name.
Each participant element has unique URN UUID attribute which helps to uniquely identify participant and session URN UUID to associate participant with specific session element. URN UUID of participant *MUST* used in the scope of CSG and no new URN UUID has to be created for the same element (participant, stream) between different CS in the same CSG. In case URN UUID has to be used permanent, careful usage of URN UUID to original AoR has to be decided by the implementers and it is implementer's choice.
A Media Stream block MUST have properties of media as seen by SRC and sent to SRS. Different instances of Media Stream block would be created whenever there is a change in media (e.g. dir change like pause/resume and/or codec change and/or participant change.).A Media Stream block MUST have the following attributes:Start Time - Represents Media Start time at SRC.End Time - Represents Media End time at SRC. This is an optional attribute and MAY be included after a stream endsMedia Stream Reference - In implementations this can reference to m-line Content - The content of an MS element will be described in terms of value from the RFC 4796 registry.NOTE: how the content attribute is conveyed (in metadata XML or in RS SDP) is still open.The metadata model should include media streams that are not being delivered to the SRS. Examples include cases where SRC offered certain media types but SRS chooses to accept only a subset of them OR an SRC may not even offer a certain media type due it its restrictions to recordA Media Stream MUST be associated with Participant and CS. The details of association with the Participant are described in the Participant block section. The details of association with CS is mentioned in the CS section.
Stream element indicates SDP media lines associated with the session and participants.
This element indicates the SDP m-line properties like label attributes, media mode. Label attribute is used to link m-line SDP body using label attribute in SDP m-line. The media mode helps in understanding whether the media is mixed or not.
Each stream element has unique URN UUID attribute which helps to uniquely identify stream and session URN UUID to associate stream with specific session element. The open item here is whether to use URN UUID (global id) or xml:id (local id).
A recording metadata object contains additional data not specified as part of siprec. This is intended to accommodate future standards track extensions, as well as vendor and user specific extensions. The mechanism MUST provide a means of unambiguously distinguishing such extension data. Extension data element MUST be contained/owned by a Metadata element. Each instance of Metadata element(except extension data element itself) MAY haveZero or more instances of Extension data elementEach Extension data element MUST be contained/owned by an Metadata element other than itself
Extensiondata element provides the mechanism by which namespace/element MAY be extended with standard or proprietary information.
extensiondata element MUST include any other XML namespace. Multiple namespace MAY exists under extensiondata. extensiondata element exist in each level like recording, session, participant, stream to provide extensiondata specific to each element. extensiondata element MUST be part of parent element for which the additional information is sent and hence no Unique ID is needed.
start-time/stop-time contains a string indicating the date and time of the status change of this tuple. The value of this element MUST follow the IMPP datetime format . Timestamps that contain 'T' or 'Z' MUST use the capitalized forms. At a time, any of the time tuple start-time or stop-time MAY exist in the element namely group, session, participant, stream and not both timestamp at the same time.
As a security measure, the timestamp element SHOULD be included in all tuples unless the exact time of the status change cannot be determined.
NOTE: Open item on whether start/stop attribute is needed for all Metadata elements
The following example provides all the tuples involved in Recording Metadata XML body.
The following example provides partial update in Recording Metadata XML body for the above example. The example illustrate the stop time of the specific stream.
This section defines XML schema for Recording metadata document
]]> The metadata information sent from SRC to SRS MAY reveal sensitive information about different participants in a session. For this reason, it is RECOMMENDED that a SRC use a strong means for authentication and metadata information protection and that it apply comprehensive authorization rules when using the metadata format defined in this document. The following sections will discuss each of these aspects in more detail.It is RECOMMENDED that a SRC authenticate SRS using the normal SIP authentication mechanisms, such as Digest as defined in Section 22 of . The mechanism used for conveying the metadata information MUST ensure integrity and SHOULD ensure confidentially of the information. In order to achieve these, an end-to-end SIP encryption mechanism, such as S/MIME described in , SHOULD be used.If a strong end-to-end security means (such as above) is not available, it is RECOMMENDED that a SRC use mutual hop-by-hop Transport Layer Security (TLS) authentication and encryption mechanisms described in "SIPS URI Scheme" and "Interdomain Requests" of .
This specification registers a new XML namespace, and a new XML schema.
URI: urn:ietf:params:xml:ns:recording
Registrant Contact: IETF SIPREC working group, Ram mohan R(rmohanr@cisco.com)
XML: the XML schema to be registered is contained in Section 6.
Its first line is ]]> and its last line is ]]>
We wish to thank John Elwell(Siemens-Enterprise), Henry Lum(Alcatel-Lucent), Leon Portman(Nice), De Villers, Andrew Hutton(Siemens-Enterprise), Deepanshu Gautam(Huawei), Charles Eckel(Cisco), Muthu Arul(Cisco), Michael Benenson(Cisco), Hadriel Kaplan (ACME), Brian Rosen(Neustar), Scott Orton(Broadsoft) for their valuable comments and inputs.
We wish to thank Joe Hildebrand(Cisco), Peter Saint-Andre(Cisco) for the valuable XML related guidance. This section describes the metadata model object instances for different use cases of SIPREC. For the sake of simplicity as the media streams sent by each of the participants is received by every other participant in these use cases, it is NOT shown in the object instance diagrams below. Also for the sake of ease not all attributes of each block are shown in these instance diagrams. Basic call between two Participants A and B. In this use case each participant sends one Media Stream. For the sake of simplicity "receives" lines are not shown in this instance diagram. Media Streams sent by each participant is received all other participants of that CS. Basic call between two Participants A and B and with Participant A or B doing a Hold/Resume. In this use case each participant sends one Media Stream. After Hold/Resume the properties of Media can change. For the sake of simplicity "receives" lines are not shown in this instance diagram. Media Streams sent by each participant is received all other participants of that CS. Basic call between two Participants A and B and with Participant A transfer(consult transfer) to Participant C. In this use case each participant sends one Media Stream. After transfer the properties of Participant A Media can change. For the sake of simplicity "receives" lines are not shown in this instance diagram. Media Streams sent by each participant is received all other participants of that CS.Depending on who act as SRC and the information that an SRC has there can be several ways to model conference use cases. This section has instance diagrams for the following cases:A CS where one of the participant (which is also SRC) is a user in a conferenceA CS where one of the participant is focus ( which is also SRC)A CS where one of the participant is user and the SRC is a different entity like B2BUAA CS where one of the participant is focus and the SRC is a different entity like B2BUANOTE: There MAY be other ways to model the same use cases depending on what information the SRC has.This is the usecase where there is a CS with one of the participant (who is also SRC) as a user in a conference. For the sake of simplicity the receive lines for each of the participant is not shown.In this example we have two participants A and B who are part of a Communication Session(CS). One of the participants B is part of a conference and also acts as SRC.There can be two cases here. B can be a participant of the conference or B can be a focus. In this instance diagram Participant B is a user in a conference. The SRC (Participant B) subscribes to conference event package to get the details of other particiants. Participant B(SRC) sends the same through the metadata to SRS. In this instance diagram the Media Stream(mixed stream) sent from Participant B has media streams contributed by conference participants (D,E,F and G). For the sake of simplicity the "receives" line is not shown here. In this example the media stream sent by each participant(A or B) of CS is received by all other participant(A or B).This is the usecase where there is a CS where one of the participant is focus ( which is also SRC). In this example we have two participants A and B who are part of a Communication Session(CS). One of the participants (C) is focus of a conference and also acts as SRC. The SRC (Participant C) being the Focus of the conference has access to the details of other particiants. SRC (Participant C) sends the same through the metadata to SRS. In this instance diagram the Media Stream(mixed stream) sent by C has media streams contributed by conference participants (A, B, D and E). Participants A, B,D and E sends Media Streams A1, B1, D1 and E1 respectively. The media stream sent by Participant C(Focus) is received by all other participants of CS. For the sake of simplicity the "receives" line is not shown linked to all other participants.NOTE: SRC ( Participant C) can send mixed stream or seperate streams to SRSA CS where one of the participant is user and the SRC is a different entity like B2BUA. In this case the SRC may not know that one of the user is part of conference. Hence the instance diagram will not have information about the conference participants.A CS where one of the participant is focus and the SRC is a different entity like B2BUA. In this case the participant which is focus sends "isfocus" in SIP message to SRC. The SRC subscribe to conference event package on seeing this "isfocus". SRC learns the details of other participants of conference from the conference package and send the same in metadata to SRS. The instance diagram for this use case is same as Case 1.