avt-minutes-96mar.html

CURRENT MEETING REPORT

Minutes of the Audio/Video Transport Working Group (avt)

Reported by Steve Casner, Precept Software, Inc.

The primary output of the AVT working group is the Real-time Transport Protocol. With the publication in January of the RTP spec as RFC1889 and the companion RTP profile for audio/video conferencing as RFC1890, it appeared that the group's work was completed except for progressing RTP from Proposed Standard to Draft and full Standard status. As a result, the group did not meet at the 34th IETF in Dallas and initially there was no meeting planned for this IETF. However, in a discussion on the group mailing list, several people brought forth topics appropriate for presentation to the group:

Proposed new RTP payload formats
RTP and MBone monitoring
RTP header compression for low bandwidth links - Fostering industry adoption of RTP for interoperability

This meeting consisted of several presentations on these topics plus some miscellaneous issues as described below. These topics will be discussed further in Internet-Drafts and on the mailing list.

1. Proposed new RTP payload formats

In addition to RFC1889 and 1890, there are four drafts awaiting publication which define the RTP payload formats for H.261, JPEG, MPEG and CellB video encodings. At this meeting, three new payload formats were proposed. In addition, to accommodate hierarchical encodings, changes to some rules stated in the main RTP spec were also proposed.

1.1. RTP changes to support hierarchical encodings

Michael Speer from Sun Microsystems presented an overview of hierarchical (or layered) encodings and a proposal to adapt RTP to better accommodate them. Hierarchical encodings break the media stream into an ordered collection of layers. The base layer provides a complete but low-quality signal that may be enhanced by composing it with additional layers. Receivers adjust the number of layers received to avoid exceeding the network bandwidth available or the receiver's own processing power. The layers are sent on separate IP multicast addresses so that multicast routing can prune the distribution tree separately for each layer.

To keep track of the ordered collection of N layers, it is proposed that N consecutive multicast addresses and 2N consecutive port numbers be allocated (ports are used in pairs for RTP and RTCP). For unicast, the single address would be used for all layers. This convention does not impact the specification of RTP itself.

Two changes to RTP are proposed: 1) to use the same SSRC ID for all layers from a given source and perform conflict resolution only on the base layer; and 2) to omit the RTCP SDES information (including CNAME) for all but the base layer, since it would be redundant. RTCP RR and SR packets would still be sent separately for each layer because that information is independent.

RTP already allows a source to use the same SSRC ID in separate sessions, but does not guarantee that the ID will remain unchanged due to collisions. If conflict resolution is done only in the base layer as proposed, should BYE packets still be sent in the other layers?

A key question for RTP is whether omitting the CNAME for the enhancement layers would unreasonably impair the operation of third-party monitors that were not aware of the association of the N layers. Another question is what impact this consecutive multicast address allocation requirement would have on the allocation schemes such as SDP/SDAP (discussed in the MMUSIC working group). Assuming successful implementation and testing of this proposal and no technical objections, it is expected that the proposed changes will be incorporated into the RTP spec when it is revised for the transition to Draft Standard.

1.2. Payload format for H.263 video

H.263 is a new ITU-T Recommendation for video compression. It is similar to H.261, but includes optimizations to support lower bit rates. An RTP payload format for H.261 has already been defined and is currently in IESG Last Call before publication as an RFC. It is not possible to use the H.261 payload format directly for H.263 because more bits are required to specify the macroblock address, the additional motion vectors for Advanced Prediction, and the additional fields for decoding interleaved "PB" frames.

Chunrong Zhu from Intel presented a proposed payload format for H.263 that copies some fields from the H.261 format and adds more. To support all options of H.263 requires 10 bytes of payload header compared to 4 bytes for H.261, but fewer bytes are required when some of the options are disabled. Therefore, three different modes A, B and C are defined, each with a different payload header size (4, 8, and 12 bytes, respectively, to maintain 32-bit alignment). The three modes may be intermixed in a single stream to maximize efficiency as allowed by the data observed.

The only significant question raised regarding this proposal was whether the additional complexity compared to H.261 is justified. Note that a packet format with optional fields requires extra tests in the data processing path and makes optimization difficult. It also increases the likelihood of bugs. However, in this case, the smallest form of the payload header can only be used when a GOB will fit within a single RTP packet. The GOB size will often exceed the network MTU, so the larger forms of the payload header are required as well. The only way to simplify the format is to require the maximum size header be used all the time. This would be too inefficient, especially at low data rates. Therefore, the proposed design appears to be the reasonable compromise.

Comments from the working group are sought both on technical aspects of the design and on the completeness and clarity of the specification. Assuming successful implementation of the H.263 payload format specification, the draft should be submitted for publication as an RFC.

1.3. Payload format issues for redundant encodings

Mark Handley from UCL gave a presentation about work at UCL and INRIA to improve audio quality in the presence of packet loss through redundant encodings. The idea is to piggy-back one or more highly compressed redundant encodings of the audio data onto later packets of the primary encoding. That way, if a packet of the primary encoding is lost, a lower-quality version of the missing audio can be reconstructed from the redundant encoding in the later packet.

What's undecided is how to carry the redundant payloads in RTP, since RTP was only designed to carry a single payload. Three ideas were proposed:

Put the redundant data in an RTP header extension (there are several problems with this idea so it is disregarded);
Define a set of dynamic payload types to indicate the combinations of primary and redundant codings to be used in a session (this has the lowest overhead but allows at most 32 combinations);
Define a single (static?) payload type that indicates redundant encodings, then in the payload section construct a chain of type-length-value blocks where the type is a normal payload type, the length is a one-byte field, and the value is the audio data. The presence of additional blocks is indicated by setting the MSB of the type byte. The primary encoding would come last and would have no length field so it is not constrained to a length of 256 bytes.

The INRIA and UCL teams differ on the choice between the second and third schemes, so advice is sought from the working group. No clear-cut deciding factors were identified in the meeting. As the work progresses, it is expected that a draft leading to one or more new RTP payload format specifications will be produced.

1.4. Payload format for ASF streams

Tim Kwok from Microsoft gave a preliminary presentation on a new proposal for "Active Stream Format" that is about to be introduced by Microsoft. ASF is a multiplexing scheme for audio, video, images, scripts and URLs that is intended to serve for both storage and transmission. Microsoft is seeking input on how to packetize ASF in RTP, and would also like to include in ASF a format that can record a collection of RTP packet streams constituting a multimedia session.

Only a few details were presented since the announcement of ASF was to be made the next week and therefore the document describing it was not yet available. ASF is a "framework" that can include filters for translation between storage and transmission formats. It is intended to be transform independent, and will provide a variety of error concealment methods. One question is how these error concealment methods can be coordinated with the RTP layer.

The proposal was that an RTP payload type be allocated to indicate a multiplexed ASF stream. One motivation claimed for multiplexing in this manner is that it simplifies synchronization. However, this goes against the RTP notion that streams should be sent separately to allow receivers to select among them an synchronize the chosen ones using the timestamps provided in RTCP SR packets. There were several comments from meeting participants who questioned the motivation for multiplexing. Perhaps the strongest supporting reason is that high volume multimedia servers do not have time to take apart stored media and construct separate RTP streams.

Greg Minshall summarized the WG sentiment: the participants have a number of questions about the proposal, but would like to hear more details. Microsoft is to produce a draft proposing an encapsulation of ASF into RTP after the ASF document is available, and then this proposal will be discussed further in the working group.

1.5. Using dynamic payload types

Steve Casner presented one slide as a reminder to implementers to include support for dynamic payload types. Creators of new payload formats have asked for static payload types to be assigned, but the 7-bit space is not large enough to define a type for all who might request one. Instead, new encodings should be tested using dynamic payload types and might later have static types assigned if they prove to be important for interoperation among implementations.

The RTP A/V Profile (RFC 1890) defines payload types 96-127 to be dynamic per session. The mapping between these types and format descriptions in a larger space is conveyed in a session protocol. For example, the SDPv2 spec under development in the MMUSIC WG provides this function. It may be appropriate to register the format names to be used in that larger space, but that is not strictly an RTP issue since the names are carried in session protocols.

2. RTP and MBone monitoring

As RTP begins to see more use, we need to learn how RTCP feedback can help in the operation of applications and general monitoring of the MBone. Steve Casner introduced the topic with a slide listing the information provided by RTCP: participant descriptions, packet loss and jitter as seen by all receivers, and propagation delay from the sender to each receiver. This information may be combined with route mapping data to produce a graphical display, as Paul Stewart did a year ago in his msessmon program. Now mtrace could be used to collect more accurate routes.

2.1. Tracking session participants

Kevin Almeroth and collaborators at Georgia Tech have implemented a tool called mcollect to gather statistics about participation dynamics in MBone sessions. This program does not yet make use of the content of RTP or RTCP packets; instead it measures the start and end times of participation in a session based on source host addresses. A number of interesting observations of user behavior, such as "session surfing" and significant variations in connection patterns for different types of sessions, were observed. Kevin posed as an open issue the question of how to take advantage of the information carried by RTCP.

2.2. Using RTCP feedback

Andrew Swan responded to a last-minute request to say a few words about work at UC Berkeley on using RTCP. He pointed out that many interesting events have been transmitted on the MBone, but that we still need to be able to better diagnose problems in the multicast distribution. RTCP feedback will be an important part of the solution.

Collecting feedback with RTCP is easy. The hard part is figuring out how to analyze and present the information. The first idea, for example in presenting the loss rates seen by all receivers, is a table. But how should the table be sorted? Work is underway to implement and test different techniques, and will be presented in more detail in the future.

3. RTP header compression for low bandwidth links

Internet Telephony is a rapidly growing application, but the commercial products do not use RTP. Part of the reason may be that they were developed before RTP was published as an RFC, but another factor may be the bandwidth overhead imposed by RTP. Even though minimizing overhead was a key consideration in the design of RTP, for very low rate audio the 12-byte RTP header may still be a problem.

Scott Petrack of IBM and Ed Ellesson of IBM have proposed that the working group define "C/RTP", a compressed-header form of RTP. Before the meeting, they sent a preliminary draft to the mailing list outlining a framework for C/RTP including examples of compression techniques that might be employed. Scott gave a presentation on these ideas at the meeting.

To motivate this effort, Scott provided a calculation showing that latency due to packetization delay increases linearly with packet header size when bandwidth is constrained. That is because one must increase the packet size if the header size is increased in order to keep the overhead ratio constant and not exceed the fixed bandwidth limit. Since highly compressed audio signals may use frames on the order of 24 bytes, the 12 bytes of RTP header (plus 28 bytes of IPv4 and UDP headers) are a significant overhead. Frames are generated every 20 or 30 milliseconds, and to minimize latency it is best to send only one frame per packet.

A variety of techniques for compressing the header size were proposed. Constant information such as the payload type and SSRC ID may be omitted if shared state can be established via some form of reliable communication (out of band). However, note that these values are constant only if the payload format is not changed and if there is only one source sending on the session (e.g., unicast). Some fields that are not constant may change by fixed amounts for contiguous packets, depending upon the media. It seems likely that the techniques designed by Van Jacobson for compression of IP/TCP headers over SLIP links may be applicable here, though some additional mechanism would be required to take the place of the TCP retransmissions that re-establish state after an error.

One issue is whether the compressed RTP should be implemented in applications (as RTP often is) and used end-to-end, or whether it should be implemented at the endpoints of slow links, perhaps in PPP as with IP/TCP header compression. It seems likely that for RTP header compression to be effective, IP and UDP headers must also be compressed, which suggests the latter approach. However, Mikael Degermark cautioned against tying compression of all three layers into one mechanism if RTP compression depends upon handshaking between the two ends. The UDP header compression that has been designed as part of the IPNG effort works over simplex connections without a handshake.

Scott's presentation did not include a specific proposal for the C/RTP protocol, but rather was a call for the working group to take up this problem as a work item. The general sentiment at the meeting was that this would be appropriate. Comments pro or con are solicited.

Scott also offered a GSMVQ audio encoding algorithm for consideration as a format to be used with RTP.

4. Fostering industry adoption of RTP for interoperability

The first topic that was raised in calling for the working group to meet at this IETF was a discussion of what should be done to foster industry adoption of RTP. However, there was not sufficient time to organize participation in the meeting by companies other than those who have been involved in the working group for some time, so there was not a lot of discussion here. Others cited adoption of RTP in Netscape's LiveMedia initiative and in the H.323 recommendation of the ITU-T as evidence that the industry is already paying attention to RTP. None the less, there may be issues such as the need for header compression that are industry concerns. It was suggested to try organizing for more industry participation at the next IETF in Montreal.

One proposal was to sponsor a Connectathon-like event to test interoperation and conformance. Ross Finlayson countered that it should not be necessary to organize an event in one place for this. Instead, interoperability testing should be conducted across the network.

5. AVT working group logistics

It became apparent at this meeting that there is new work that should be addressed by the AVT working group, and that AVT should continue to hold separate sessions at IETF rather than merging with MMUSIC. In particular, it was agreed that AVT should meet in Montreal. The new work includes:

Additional payload format specs, as presented in this meeting - RTP header compression
Should there be an interface service definition for RTP? - Extending RTP for applications other than audio/video

On this last topic, during IETF week a new draft on "RTP extension for Scalable Reliable Multicast" (draft-parnes-avt-srm-00.txt) was posted to the mailing list by Peter Parnes from LuTH/CDT. Everyone is encouraged to read that draft.

Since AVT's initial charter has been completed, this new work needs an updated charter. The chair will prepare a revised charter for comments.

6. Miscellaneous issues

Steve Casner brought up one last issue before adjourning the meeting. In Section 8.2 of the RTP spec, an algorithm is defined for detecting collisions of SSRC ID allocations and loops induced by RTP mixers or translators. As defined, that algorithm will work properly only if sources use the same UDP source port number (not destination port number) for both RTP and RTCP packets in a session. However, not all implementations of RTP do that; the vat and vic programs from LBL are counter examples. The algorithm can be modified to allow the ports to be different, at the cost of potentially locking on to the RTP packets from one source and the RTCP packets from another. On the other hand, some people have expressed the opinion that the algorithm is too complicated and hard to test (because collisions should be very rare). This is one area in which implementation experience is needed to determine what changes, if any, should be made before the transition of RTP to Draft Standard.

Due to time limits, there was no discussion of this topic at the meeting. Steve Casner intends to write an informational RFC on the issue so that implementers of RTP can be aware of it.