Internet Engineering Task Force AVT WG Internet Draft Mark Handley draft-ietf-avt-germ-00.txt ISI November 11, 1998 Expires: May, 1999 GeRM: Generic RTP Multiplexing STATUS OF THIS MEMO This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as ``work in progress''. To learn the current status of any Internet-Draft, please check the ``1id-abstracts.txt'' listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or ftp.isi.edu (US West Coast). Distribution of this document is unlimited. ABSTRACT This document describes GeRM, an RTP payload format for generic multiplexing of multiple RTP streams. This document is a product of the Audio/Video Transport (AVT) working group of the Internet Engineering Task Force. Comments are solicited and should be addressed to the working group's mailing list at rem-conf@es.net and/or the author. 1 Introduction When RTP[1] is used for end-to-end communication, each RTP data stream in a session should be send separately to a different UDP port. This allows heterogeneous treatment of the streams by the network. For example, in a multimedia conference, we may be willing to pay to make an RSVP[2] reservation for the audio, but unable to Mark Handley [Page 1] Internet Draft GeRM November 11, 1998 reserve sufficient bandwidth for the video. Thus in the general case, we argue that multiplexing multiple RTP streams together should be avoided. However, there are circumstances when this general rule may not make a great deal of sense. If a stream is very low bandwidth, but needs low latency, the overhead of RTP packetisation may be too large. On slow modem links this can be overcome by using IP/UDP/RTP header compression [3], but this is a hop-by-hop compression scheme, and so is unsuitable for congested high-speed backbone links. MPEG 4 is an example of a codec that produces multiple elementary streams that comprise a single video stream. Many of these elementary streams are very low bandwidth. It makes little sense to packetise each of these elementary streams separately and send it to its own RTP/UDP port. Instead a network-aware multiplexing layer is required that can combine multiple elementary stream data units into a single RTP packet in a way that does not reduce resilience to packet loss[4]. Another example is that of IP telephony gateways. In such a gateway, incoming PSTN calls are packetised over RTP and transmitted to a remote gateway, where they are turned back into PSTN calls again. Between any pair of gateways there may be many simultaneous telephone calls. If a relatively low bitrate codec is used such as GSM (approx 14Kbps), each of these flows then gains its own IP, UDP and RTP headers comprising 40 bytes. With 20ms packetisation, the overhead is over 100%. In this case, many (if not all) of the flows expect the same network service. The header overhead can be significantly reduced if multiple unrelated flows are multiplexed together into a single RTP packet. In both these cases and other similar ones, we can design specific multiplexing protocols that satisfy one particular problem domain. Rather than do this, we propose a multiplexing protocol that attempts to be generic. Any pair of RTP flows with the same source and destination may be multiplexed together. The degree of compression depends on the similarity of the two flows, but the per-flow overhead is always less than a single RTP header (without IP or UDP), and is typically much better. 2 Specification The approach taken in GeRM is similar to that taken with IP/UDP/RTP header compression, in that only differences between one packet and the next are encoded. However, unlike IP/UDP/RTP header compression, GeRM does this by only encoding the differences between the RTP headers of the different payloads in the same multiplexed packet, and Mark Handley [Page 2] Internet Draft GeRM November 11, 1998 all RTP header state is reinitialised in each new packet. As a result GeRM can function effectively across multiple network hops. The basic model is that a single IP packet contains multiple RTP headers each followed by its own payload. Each of these RTP headers followed by its payload is referred to as a sub-packet compresses each sub-packet header, so that fields which are predictable between one sub-header and the next sub-header within the same packet are not sent. Each multiplexed RTP packet has a full RTP header which contains the SSRC, Sequence number, Timestamp, etc corresponding to the first sub-packet payload, but the RTP payload type field is set to a value indicating this is a GeRM packet. The first sub-packet header will compress out completely except for the payload-type field and length because the full RTP header and the sub-packet header only differ in the payload-type. The second sub-packet header is then encoded based on predictable differences between the original RTP header for that sub-packet and the original RTP header for the first sub-packet. The third sub-packet header is then encoded off of the original RTP header for the second sub-packet and so forth. A regular RTP header has the following format: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | .... | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ A GeRM header consists of one byte followed by any RTP fields that are not predictable from the previous header. The parts of the RTP header corresponding to the bits of the GeRM header are as follows: Mark Handley [Page 3] Internet Draft GeRM November 11, 1998 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | V |P|X| CC |M| PT | sequence number | |0 0|0|0|0 0 0 0|1|2 2 2 2 2 2 2|3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | |4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 5 5 5 5 5 5 5 5| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | |6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6| +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | contributing source (CSRC) identifiers | | not compressed | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The GeRM Header is one byte: 0 1 2 3 4 5 6 7 +--+--+--+--+--+--+--+--+ |B0|B1|B2|B3|B4|B5|B6|B7| +--+--+--+--+--+--+--+--+ The meaning of these bits is: B0: -zero indicates that the first byte of the original RTP header remains unchanged from the original RTP header in the previous subpacket (or outer RTP header if there's no previous sub- packet in this packet). I.e, V, CC and P are unchanged. -one indicates that the first byte (byte 1) of the original RTP header immediately follows the GeRM header. B1: Contains the marker bit from the sub-packet's RTP header. B2: -zero indicates that the payload type remains unchanged. -one indicates that the payload type field follows the GeRM header and any byte 0 header that may be present. Although PT is a seven bit field, it is added as an eight bit field. Bit 0 of this byte MUST be zero. Mark Handley [Page 4] Internet Draft GeRM November 11, 1998 B3: -zero indicates that the sequence number remains unchanged. -one indicates that the 16 bit sequence number field follows the GeRM header and any byte 1 or PT header that may be present. B4: -zero indicates that the timestamp remains unchanged. -one indicates that the 32 bit timestamp field follows the GeRM header and any byte 1, PT or sequence number header that may be present. B5: -zero indicates that the most significant 24 bits of the SSRC remain unchanged. -one indicates that the most significant 24 bits of the SSRC follows the GeRM header and any byte 1, PT, sequence number or timestamp field that may be present. B6: -zero indicates that the least significant 8 bits of the SSRC are one higher than the preceding SSRC. -one indicates that the least significant 8 bits of the SSRC follows the GeRM header and any byte 1, PT, sequence number, timestamp or MSB SSRC header fields that may be present. B7: -zero indicates that the sub-packet length in bytes (ignoring the sub-packet header) is unchanged from the previous sub- packet. -one indicates that the sub-packet length (ignoring the sub- packet header) follows all the other GeRM headers as an 8-bit unsigned integer length field. Any CSRC fields present in the original RTP header then follow the GeRM headers. 3 Examples Mark Handley [Page 5] Internet Draft GeRM November 11, 1998 In this section we attempt to characterise the likely behaviour of GeRM in some typical circumstances. 3.1 Arbitrary Streams, Same Payload Type Five RTP streams that originate at separate RTP sources (with SSRCs SSRC1 to SSRC5, sequence numbers SEQ1 to SEQ5, and timestamps T1 to T5) are being multiplexed together. They each use GSM compression, and the GSM codec uses fixed size frames. The compound packet is as follows: IP Header UDP Header RTP Header, V=0, P=0, CC=0, PT=>GERM, SEQ=SEQ1, TS=T1, SSRC=SSRC1 GeRM Header, B0=0, B1=M1, B2=1, B3=0, B4=0, B5=0, B6=0, B7=1 8-bit PT=>GSM 8-bit length = length of GSM frame GSM payload 1 GeRM Header, B0=0, B1=M2, B2=0, B3=1, B4=1, B5=1, B6=1, B7=0 16-bit Sequence Number SEQ2 32-bit Timestamp TS2 24+8 bit SSRC SSRC2 GSM payload 2 ... GeRM Header, B0=0, B1=M5, B2=0, B3=1, B4=1, B5=1, B6=1, B7=0 16-bit Sequence Number SEQ5 32-bit Timestamp TS5 24+8 bit SSRC SSRC5 GSM payload 5 The overhead is 40 bytes (IP+UDP+RTP) + 3 bytes (first sub-header) + 11 bytes (each subsequent sub-header), or a total of 87 bytes, as opposed to 200 bytes for separate RTP packets. In some cases, having a multiplex stream sequence number in the outer RTP packet (rather than the first payload sequence number) might be desirable. This might be the case if we wish to add packet-level FEC to the multiplexed stream. In such a case, the sequence number of sub-packet 1 does not compress out, adding a further two bytes overhead. 3.2 Cooperating PSTN-IP gateways If several RTP streams coded with the same codec are ordinating at a PSTN->IP gateway and all terminate at the same IP->PSTN gateway, and Mark Handley [Page 6] Internet Draft GeRM November 11, 1998 if we assume that an out-of-band signalling mechanism is used to communicate SSRC information at call setup time, then we can achieve significantly better compression. To do this we algorithmically generate the SSRC rather than allocating it randomly as specified in the RTP specification. This is acceptable in this context because only the remote gateway will ever see the SSRC. As consecutive flows arrive, they are given consecutive SSRCs, which in any event must be communicated as part of the call setup mechanism. All the flows are digitised and compressed at the same time, so they share a common clock and hence common timestamps. If no silence suppression is performed, the sequence numbers can be consecutive too, but we do not assume this. As flows terminate, they will leave gaps in the SSRC space. New flows are then allocated the now unused SSRCs to attempt to keep the SSRC space as contiguous as possible. For the sake of example, we assume we have SSRCs 1 to 3 and 5 to 10 in use, and that the flows with SSRC 5, 7 and 8 are being silence suppressed. This leaves us with flows 1,2,3,6,9 and 10 to transmit. IP Header UDP Header RTP Header, V=0, P=0, CC=0, PT=>GERM, SEQ=SEQ1, TS=T1, SSRC=SSRC1 GeRM Header, B0=0, B1=M1, B2=1, B3=0, B4=0, B5=0, B6=0, B7=1 8-bit PT=>GSM 8-bit length = length of GSM frame GSM payload 1 GeRM Header, B0=0, B1=M2, B2=0, B3=1, B4=0, B5=0, B6=0, B7=0 16-bit Sequence Number SEQ2 GSM payload 2 GeRM Header, B0=0, B1=M3, B2=0, B3=1, B4=0, B5=0, B6=0, B7=0 16-bit Sequence Number SEQ3 GSM payload 3 GeRM Header, B0=0, B1=M6, B2=0, B3=1, B4=0, B5=0, B6=1, B7=0 16-bit Sequence Number SEQ6 8-bit LSByte of SSRC 6 GSM payload 6 GeRM Header, B0=0, B1=M9, B2=0, B3=1, B4=0, B5=0, B6=1, B7=0 16-bit Sequence Number SEQ9 8-bit LSByte of SSRC 9 GSM payload 9 GeRM Header, B0=0, B1=M10, B2=0, B3=1, B4=0, B5=0, B6=0, B7=0 16-bit Sequence Number SEQ10 Mark Handley [Page 7] Internet Draft GeRM November 11, 1998 GSM payload 10 Thus the overhead is 40 bytes for the IP/UDP/RTP, 3 bytes for sub- header 1, 4 bytes each for sub-headers 2, 3, and 10, and 5 bytes for sub-headers 6 and 9. This totals 65 bytes against 240 bytes for separate IP/UDP/RTP headers per flow. Typically each new flow being included in the packet will require 4 to 5 bytes of overhead in addition to the compressed data itself. We might envisage ways in which sequence numbers of flows can also be manipulated as a flow returns from silence suppression (step the sequence number to match that of the flow with preceding SSRC) if we are sure that the flow will be removed from RTP at the next destination. This would reduce the per-flow overhead to between 1 and 5 bytes depending on the effectiveness of this mapping. Whether this is worth pursuing is an open issue that providers may consider. It MUST NOT be done unless the destination knows to expect such behaviour and not treat it as loss. Appendix A: Author's Address Mark Handley Information Sciences Institute, University of Southern California, c/o MIT Laboratory for Computer Science, 545 Technology Square, Cambridge, MA 02139, United States electronic mail: mjh@isi.edu 4 Bibliography [1] H. Schulzrinne, S. Casner, R. Frederick and V. Jacobson "RTP: A Transport Protocol for Real-Time Applications" RFC 1889. [2] R. Braden, Ed., L. Zhang, S. Berson, S. Herzog, S. Jamin, "Resource ReSerVation Protocol (RSVP) -- Version 1 Functional Specification", RFC 2205 [3] S. Casner, V. Jacobson, "Compressing IP/UDP/RTP Headers for Low-Speed Serial Links", Internet Draft. [4] M. Handley, "Guidelines for writers of RTP payload format specifications", Internet Draft. Mark Handley [Page 8]