Editor's note: These minutes have not been edited. Minutes of the Audio/Video Transport Working Group Reported by Steve Casner 1. Introduction and status The AVT working group met for two sessions at the San Jose '96 IETF. AVT has produced the Real-time Transport Protocol, which was published in January 1996 as a Proposed Standard RFC 1889 along with the companion RTP profile for audio/video conferencing RFC 1890. The minimum interval before advancement to Draft Standard has now passed; during that time, RTP use has grown with additional independent implementations and a broader range of applications. Accordingly, at this meeting we discussed what changes to the spec may be required for advancement to Draft Standard. In particular, the biggest topic was RTCP scalability for very large sessions, especially with asymmetric or low data rate links. Also of significance for low-speed links was the second major topic of this meeting, an update on the IP/UDP/RTP header compression protocol. Since the last meeting, RFCs 2029, 2032, 2035 and 2038 defining the RTP payload formats for CellB, H.261, JPEG and MPEG video encodings were also published as Proposed Standards, in October 1996. In the second AVT session, a problem with RFC 2038 regarding packet loss resilience for MPEG2 was presented, as were RTP Payload Formats for Redundant Audio, H.728 audio and H.263 video. A status report on development of an RTP MIB was presented; this is a potential future work item for the group, along with completion of the changes for transition of the RTP spec to Draft Standard and standardization of additional profiles and payload formats. 2. RTCP scaling for large "broadcast" sessions Before this IETF meeting, there were several messages to the AVT mailing list regarding use of RTCP in very large "broadcast" scenarios, especially those where endpoints are connected by low-speed modems or asymmetric cable systems with low-speed uplinks. Bernard Aboba and Henning Schulzrinne gave presentations on the problems that arise along with a variety of possible solutions. The interval between RTCP packets increases with the group size so that RTCP consumes a constant fraction of the session bandwidth. This keeps RTP well behaved as it scales to large groups. However, there are scenarios for which the current spec is inadequate: - Because the initial RTCP report is sent within a fixed interval after joining the session independent of the group size, there can be a large bandwidth spike if many participants join at the same time, e.g., for a scheduled event. Convergence to the proper report interval may take too long, in particular for low-bandwidth sessions. - For sessions larger than a few thousand, the long RTCP interval may preclude collection of the desired information. - The memory required by the suggested RTCP interval calculation algorithm to track all the participant SSRC identifiers may be excessive for very large sessions. - For some types of sessions, privacy of reception may be more important than the feedback RTCP provides. - On low-speed modems, giving even 5% of the bandwidth to RTCP draws cries from the people who have to squeeze the data to fit. - Network operators are concerned about the multicast routing state required to track all the participants as multicast sources. Several potential solutions for these problems were presented and evolved in the discussion that followed: - The initial bandwidth spike can be avoided if the report interval is "reconsidered" before sending a report. That is, sending is delayed until the updated group size estimate at the end of the interval is not much larger than it was at the start. Group size estimation can also be accelerated by having each participant send the estimate it has counted, but use the maximum of all estimates received to calculate the interval. - Memory requirements may be reduced by storing only a sampling of the SSRC identifiers heard, based on a probability function. The probability can be scaled as needed to fit the group size. - The report interval can be reduced by having only some of the participants report, perhaps controlled by a sampling function from the sender, if partial feedback is appropriate. - Receiver RTCP reports could be sent via unicast to the sender or some monitor, or sent on a separate multicast address, but the report interval must be controlled by some external means since the receiver can't calculate the interval itself. There is a danger that if receivers allow some other entity to control their report rate and destination, this could be subverted to packet bomb some innocent party. - It is always safe (from a network load point of view) if receive-only participants do not send RTCP. This avoids several of the problems listed, but precludes feedback. The development of diagnostic tools such as mtrace makes RTCP less critical for network diagnosis than before. - Report bandwidth can be reduced by summarization of reports by selected receivers through self-organization using TTL scoping, or by RTP translators explicitly interposed in the distribution tree. This adds significant complication. Many good points were brought up in the discussion of these possible solutions: - Using a distributed consensus algorithm driven by multicast is robust because a single defector can't inflate the rate. - Encryption can be used for privacy, though this can add legal complications in some countries. - The current RTCP algorithm represents a design point that we believe works well for a wide range of applications and which has been tested in experiments on the MBone. But there may be changes needed to extend operation to highly asymmetric networks or other scenarios outside that range. Each of the potential solutions has some drawbacks, and there is not one solution that is the clear answer in all cases. Further experimentation is needed to test the proposed ideas. However, it was agreed that the wording in the main RTP spec should be modified to allow more flexibility in how RTCP may be used, and that the means by which different modes should be defined and selected is through the specification of (a small number of) additional RTP Profiles. These profiles might be specified in the description of a session, e.g., in SDP using "RTP/AVB" ("B" for broadcast) rather than "RTP/AVP". Proposals are solicited in the form of Internet Drafts defining additional profiles to be the subject of experimentation and then consideration for standardization. 3. IP/UDP/RTP header compression Steve Casner presented an update on the proposal developed with Van Jacobson for hop-by-hop compression of IP/UDP/RTP headers to allow use over low-speed lines. This proposal was introduced at the previous AVT meeting in Montreal. The update incorporates changes agreed in Montreal and subsequent improvements: - The context ID is always sent, and in the first byte to allow overlap with a link-level segmentation scheme when feasible. - The sequence number is increased to 4 bits and is changed to be context-specific rather than global. - A new delta encoding scheme was designed to fit the patterns encountered in typical RTP applications, replacing the temporary use of the scheme from RFC 1144 TCP/IP header compression. - RTP header compression is now integrated into the IPv6 header compression scheme (which also supports IPv4 and encapsulation in multiple headers). - It is now specified that RTCP packets will not be compressed, since the traffic fraction is small and the required increase in shared context would be impractical. Manoj Leelanivas presented a report on his implementation of the header compression algorithm and its performance. He emphasized that since the identification of which UDP streams carry RTP is only though heuristics, it is essential to have a negative cache for UDP streams that don't compress; otherwise, the context cache will thrash as each new packet may fail to match any existing context. The compression scheme performed as expected: most audio packet headers compressed to 2 bytes without UDP checksum or 4 with; most video packet headers compressed to 4 or 6 bytes. During discussion of the proposal, it was agreed that the delta encoding should be specified to be table driven, with the default table being as presented but with holes in the encoding space used to encode negative deltas that may occur with MPEG or out-of-order packets. Also, for use of header compression over higher-speed lines where there may be a larger number of contexts, either an 8- or 16-bit context ID should be allowed via negotiation. The working group agreed that this proposal should go to Last Call with these revisions in place. Steve Casner also made a short presentation in the PPPEXT working group meeting to coordinate the allocation of PPP packet types that will be required for this compression scheme. 4. Transport aspects of RTSP Proposal The Real-Time Streaming Protocol is under discussion primarily in the MMUSIC working group, but also has some transport aspects that were discussed briefly in AVT. The RTSP proposal has included a reduced-sized variant of RTP that was termed "compressed RTP". Since the AVT working group agreed in Montreal that the combined IP/UDP/RTP compression scheme described in the previous section should be standardized and that an end-to-end RTP-only compression scheme should not, the RTSP authors will extract the definition of RTP variant into a separate non-standards-track interim proposal named CUSH, for Compressed UDP Stream Header. In the main RTSP specification, the means of specifying the underlying stream transport protocol will be made more flexible to provide better separation between control and transport. 5. RTP payload formats The second AVT session covered topics beyond the main RTP spec, primarily additional payload formats plus some potential new application areas for RTP. 5.1 Redundant audio payload format Colin Perkins gave an update on draft-perkins-rtp-redundancy-01.txt which defines a mechanism for carrying redundant audio formats in RTP to compensate for packet loss. The redundant copy is usually more heavily compressed to reduce overhead. This updated draft includes modifications suggested at the Montreal meeting. Two interoperating implementations are in regular use. The working group agreed that this proposal should go to Last Call after a few minor edits. 5.2 Payload format for H.263 video Chad Zhu presented revisions to the payload format for H.263 video in draft-ietf-avt-rtp-payload-02.txt. These revisions reflect changes in the ITU H.263 spec in addition to suggested clarifications and other minor changes from the AVT group. One change, prompted by the development of H.263+, was to add a version number in the payload format. This change generated some discussion because it introduces another multiplexing point in the processing of each data packet. The group concluded that it would be preferable to design the payload format to accommodate the expected changes as well as possible, and then assign a new payload type if an incompatibility arises. The H.263 payload format spec should also be ready for Last Call after the version number is removed. 5.3 Problems with MPEG2 payload format The payload format spec for MPEG1 and MPEG2 was recently published as Proposed Standard RFC 2038. Reha Civanlar has identified a problem with this spec in that the packet loss resilience information is insufficient for MPEG2. He presented two recommendations for supplying the necessary information: - An optional second payload-specific header word would be added to carry the additional picture layer information required, and redundant sequence and GOP headers would be transmitted periodically as needed to achieve the desired loss probability. The overhead would be less than 1% for a 4 Mbps data rate assuming 2 sequence/GOP header retransmissions per second. - To reduce the redundancy overhead, the "high priority" header information could be sent using a "reliable" protocol prior to the main transmission of the video data using RTP. Since inclusion of the additional redundancy information in the payload header is optional, use of this method does not require changes to RTP. The group accepted the proposed extensions, and the chair asked Reha to write up the proposed change for inclusion into the MPEG payload format specification before its transition to Draft Standard. Reha also proposed an alternate format for bundled audio + video MPEG payload format described in draft-civanlar-bmpeg-00.txt. This format would give reduced overhead and better error resilience compared to the encapsulation defined in RFC 2038 for MPEG1 Systems or MPEG2 Transport streams. However, such improvements were considered in an early draft of RFC 2038 but were discarded because applications that have data in the Systems or Transport streams formats and can't afford the data handling required to take advantage of the benefits of separate audio and video Elementary streams also could not afford to use the more efficient bundled format. Reha's proposal is put forth for comparison testing and will be considered as an alternate format. 5.4 Payload format for G.728 audio Ofer Shapiro has submitted text to be added to the RTP Audio/Video Profile to describe the payload format for G.728 audio. No separate payload format spec is needed since only a few paragraphs are needed to describe the format. The four 10-bit vectors per audio frame are simply packed into 5 bytes, MSB first. Multiple frames may be packed into each packet. This format has been accepted by the ITU study group covering the use of RTP in H.323. 6. New applications of RTP Several proposals for new applications of RTP have been recently submitted to the working group. Two of them, a payload format for carrying HTTP over RTP in draft-aboba-rtp-http-01.txt, and a proposal for adding Scalable Reliable Multicast mechanisms to RTP, described in draft-parnes-rtp-ext-srm-01.txt, fall into the general area of reliable multicast which the Transport Area Directors want to organize as a separate IRTF Research Group and later IETF Working Groups. The group discussed this plan and the fit of new work into the AVT charter, but discussion of these proposals was deferred until organization of the reliable multicast research area is sorted out. 6.1 Using RTP with caching Two presentations on new applications were made at this meeting. The first was by Roger Kermode on the idea of layering audio and video streams in time as well as quality, then combining this layering with "proactive" caching to reduce latency and bandwidth requirements for on-demand playback. Multiple RTP streams (layers) from a particular video subject would be used for the different access phases implied by (near) on-demand access as well as for multiple quality levels as desired by different receivers. Receivers would combine multiple streams from caches and original sources with local storage to produce the desired quality of playback at the desired time. This work is at the research stage, but is likely to draw on RTP (possibly with SRM extensions), RTCP, RTSP and SDP as building blocks. 6.2 Aggregation Service within RTP The second presentation was by Jonathan Rosenberg on multiplexing several RTP audio sessions into one packet stream. This could be used, for example, to increase packet efficiency substantially between Internet Telephony gateways. Such gateways have recently been developed and deployed to allow long-distance telephone callers to dial a local gateway which then establishes an RTP stream to another gateway near the desired destination and instructs the remote gateway to dial the callee's telephone. There may be many parallel streams of small packets between these gateways; aggregating these streams into larger packets conserves header overhead without increasing delay as would be the case for large packets in a single stream. Jonathan presented various methods for assigning individual streams to logical channels in the aggregate and for carrying the information that must remain specific to each stream. The variations trade off efficiency and the extent to which interpretation of fields in the RTP header must change. Full details of the options and efficiency results are given in draft-rosenberg-itg-00.txt and .ps. Further work is needed before deciding whether some standardization should be undertaken. 7. Status report on RTP MIB Stan Naudus gave a status report on the development of an RTP MIB. While RTCP provides an efficient and scalable means to obtain feedback from end systems in a multicast session, it does not provide third party management of unicast sessions such as in Internet Telephony. There may also be a need for additional remote control of RTP mixers and translators beyond what would be practical to implement with RTCP. The challenge in designing a MIB for RTP is to make it scalable over the wide variety of applications in which RTP might be used. The design currently includes 8 tables and 62 objects. Greg Minshall questioned whether 62 was too many objects for a practical MIB. The goal of the authors is to continue work on the MIB, including implementation testing to assess practicality, and submit it for AVT consideration at the next meeting in Memphis. 8. Advancing RTP to Draft Standard As noted in the introduction, it is time to advance the RTP spec, RFC 1889, and the RTP Profile, RFC 1890 to Draft Standard. Steve Casner presented a list of (potential) changes to be addressed for advancement: - changes for RTCP scaling as described earlier - rule changes proposed for layered encodings as described in draft-speer-avt-layered-video-01.txt - revision of the loop/collision algorithm for separate RTP and RTCP source port numbers - some clarifications of the wording and small changes such as allowing separate unicast ports that have already been made in the spec source files - allowing separate multicast addresses for RTP/RTCP These items are all straightforward or were previously discussed except for the last. Should the specification of an RTP session be generalized to allow different addresses to be used for RTP and RTCP? This would allow RTCP delivery to be constrained for improved scalability in some scenarios. After some discussion, the consensus of the group was that we should take the conservative approach and not make this change at this time. In addition to any required revisions of the RTP specification, advancing to Draft Standard requires evidence that interoperability requirements have been met. This should be easy to provide since there are several genetically distinct implementations of RTP, both research and commercial, in use. The exact form required for the evidence has not been determined by the Area Directors yet, but it is likely that an applicability document will be needed to list the areas in which RTP has been successfully used and the areas to which we believe it can be extended. We may also be required to document the scalability of RTP in a manner similar to that proposed by the Area Directors as a requirement for reliable multicast protocol proposals. 9. Advancing RTP Profile to Draft Standard Since the RTP Profile is a simpler document than the main RTP spec, there are fewer issues for its advancement. One question of significance is whether the Profile should restrict the format of the RTCP SDES CNAME item beyond the suggestions of the RTP spec so that separate implementations of audio and video tools will be more likely to use a common format. This commonality is important to allow audio and video streams to be associated for synchronization and other presentation management. The group agreed that the profile should recommend use of only the numeric address form of the CNAME. In addition to this change, there are several of the simple audio payload formats included in the Profile that need some additional details specified, such as bit and byte order for packing. Some additional formats, such as G.723 and G.728, will also be added. 10. Future work The AVT meeting ended with a discussion of how much work remains, since the original charter has been completed. The group agreed to meet again in Memphis, with the intention that work on new Profiles for RTCP scaling and the new Payload Formats currently in progress can be finished then.