CURRENT MEETING REPORT
Minutes of the Audio/Video Transport
Working Group (avt)
Reported by Steve Casner, Precept Software,
The primary output of the AVT working group
is the Real-time Transport Protocol. With the publication in January
of the RTP spec as RFC1889 and the companion RTP profile for audio/video
conferencing as RFC1890, it appeared that the group's work was
completed except for progressing RTP from Proposed Standard to
Draft and full Standard status. As a result, the group did not
meet at the 34th IETF in Dallas and initially there was no meeting
planned for this IETF. However, in a discussion on the group mailing
list, several people brought forth topics appropriate for presentation
to the group:
This meeting consisted of several presentations
on these topics plus some miscellaneous issues as described below.
These topics will be discussed further in Internet-Drafts and
on the mailing list.
1. Proposed new RTP payload formats
In addition to RFC1889 and 1890, there are
four drafts awaiting publication which define the RTP payload
formats for H.261, JPEG, MPEG and CellB video encodings. At this
meeting, three new payload formats were proposed. In addition,
to accommodate hierarchical encodings, changes to some rules stated
in the main RTP spec were also proposed.
1.1. RTP changes to support hierarchical encodings
Michael Speer from Sun Microsystems presented an overview of hierarchical (or layered) encodings and a proposal to adapt RTP to better accommodate them. Hierarchical encodings break the media stream into an ordered collection of layers. The base layer provides a complete but low-quality signal that may be enhanced by composing it with additional layers. Receivers adjust the number of layers received to avoid exceeding the network bandwidth available or the receiver's own processing power. The layers are sent on separate IP multicast addresses so that multicast routing can prune the distribution tree separately for each layer.
To keep track of the ordered collection of N layers, it is proposed that N consecutive multicast addresses and 2N consecutive port numbers be allocated (ports are used in pairs for RTP and RTCP). For unicast, the single address would be used for all layers. This convention does not impact the specification of RTP itself.
Two changes to RTP are proposed: 1) to use the same SSRC ID for all layers from a given source and perform conflict resolution only on the base layer; and 2) to omit the RTCP SDES information (including CNAME) for all but the base layer, since it would be redundant. RTCP RR and SR packets would still be sent separately for each layer because that information is independent.
RTP already allows a source to use the same SSRC ID in separate sessions, but does not guarantee that the ID will remain unchanged due to collisions. If conflict resolution is done only in the base layer as proposed, should BYE packets still be sent in the other layers?
A key question for RTP is whether omitting the CNAME for the enhancement layers would unreasonably impair the operation of third-party monitors that were not aware of the association of the N layers. Another question is what impact this consecutive multicast address allocation requirement would have on the allocation schemes such as SDP/SDAP (discussed in the MMUSIC working group). Assuming successful implementation and testing of this proposal and no technical objections, it is expected that the proposed changes will be incorporated into the RTP spec when it is revised for the transition to Draft Standard.
1.2. Payload format for H.263 video
H.263 is a new ITU-T Recommendation for video compression. It is similar to H.261, but includes optimizations to support lower bit rates. An RTP payload format for H.261 has already been defined and is currently in IESG Last Call before publication as an RFC. It is not possible to use the H.261 payload format directly for H.263 because more bits are required to specify the macroblock address, the additional motion vectors for Advanced Prediction, and the additional fields for decoding interleaved "PB" frames.
Chunrong Zhu from Intel presented a proposed payload format for H.263 that copies some fields from the H.261 format and adds more. To support all options of H.263 requires 10 bytes of payload header compared to 4 bytes for H.261, but fewer bytes are required when some of the options are disabled. Therefore, three different modes A, B and C are defined, each with a different payload header size (4, 8, and 12 bytes, respectively, to maintain 32-bit alignment). The three modes may be intermixed in a single stream to maximize efficiency as allowed by the data observed.
The only significant question raised regarding this proposal was whether the additional complexity compared to H.261 is justified. Note that a packet format with optional fields requires extra tests in the data processing path and makes optimization difficult. It also increases the likelihood of bugs. However, in this case, the smallest form of the payload header can only be used when a GOB will fit within a single RTP packet. The GOB size will often exceed the network MTU, so the larger forms of the payload header are required as well. The only way to simplify the format is to require the maximum size header be used all the time. This would be too inefficient, especially at low data rates. Therefore, the proposed design appears to be the reasonable compromise.
Comments from the working group are sought both on technical aspects of the design and on the completeness and clarity of the specification. Assuming successful implementation of the H.263 payload format specification, the draft should be submitted for publication as an RFC.
1.3. Payload format issues for redundant encodings
Mark Handley from UCL gave a presentation about work at UCL and INRIA to improve audio quality in the presence of packet loss through redundant encodings. The idea is to piggy-back one or more highly compressed redundant encodings of the audio data onto later packets of the primary encoding. That way, if a packet of the primary encoding is lost, a lower-quality version of the missing audio can be reconstructed from the redundant encoding in the later packet.
What's undecided is how to carry the redundant payloads in RTP, since RTP was only designed to carry a single payload. Three ideas were proposed:
The INRIA and UCL teams differ on the choice between the second and third schemes, so advice is sought from the working group. No clear-cut deciding factors were identified in the meeting. As the work progresses, it is expected that a draft leading to one or more new RTP payload format specifications will be produced.
1.4. Payload format for ASF streams
Tim Kwok from Microsoft gave a preliminary presentation on a new proposal for "Active Stream Format" that is about to be introduced by Microsoft. ASF is a multiplexing scheme for audio, video, images, scripts and URLs that is intended to serve for both storage and transmission. Microsoft is seeking input on how to packetize ASF in RTP, and would also like to include in ASF a format that can record a collection of RTP packet streams constituting a multimedia session.
Only a few details were presented since the announcement of ASF was to be made the next week and therefore the document describing it was not yet available. ASF is a "framework" that can include filters for translation between storage and transmission formats. It is intended to be transform independent, and will provide a variety of error concealment methods. One question is how these error concealment methods can be coordinated with the RTP layer.
The proposal was that an RTP payload type be allocated to indicate a multiplexed ASF stream. One motivation claimed for multiplexing in this manner is that it simplifies synchronization. However, this goes against the RTP notion that streams should be sent separately to allow receivers to select among them an synchronize the chosen ones using the timestamps provided in RTCP SR packets. There were several comments from meeting participants who questioned the motivation for multiplexing. Perhaps the strongest supporting reason is that high volume multimedia servers do not have time to take apart stored media and construct separate RTP streams.
Greg Minshall summarized the WG sentiment: the participants have a number of questions about the proposal, but would like to hear more details. Microsoft is to produce a draft proposing an encapsulation of ASF into RTP after the ASF document is available, and then this proposal will be discussed further in the working group.
1.5. Using dynamic payload types
Steve Casner presented one slide as a reminder to implementers to include support for dynamic payload types. Creators of new payload formats have asked for static payload types to be assigned, but the 7-bit space is not large enough to define a type for all who might request one. Instead, new encodings should be tested using dynamic payload types and might later have static types assigned if they prove to be important for interoperation among implementations.
The RTP A/V Profile (RFC 1890) defines payload types 96-127 to be dynamic per session. The mapping between these types and format descriptions in a larger space is conveyed in a session protocol. For example, the SDPv2 spec under development in the MMUSIC WG provides this function. It may be appropriate to register the format names to be used in that larger space, but that is not strictly an RTP issue since the names are carried in session protocols.
2. RTP and MBone monitoring
As RTP begins to see more use, we need to
learn how RTCP feedback can help in the operation of applications
and general monitoring of the MBone. Steve Casner introduced the
topic with a slide listing the information provided by RTCP: participant
descriptions, packet loss and jitter as seen by all receivers,
and propagation delay from the sender to each receiver. This information
may be combined with route mapping data to produce a graphical
display, as Paul Stewart did a year ago in his msessmon program.
Now mtrace could be used to collect more accurate routes.
2.1. Tracking session participants
Kevin Almeroth and collaborators at Georgia Tech have implemented a tool called mcollect to gather statistics about participation dynamics in MBone sessions. This program does not yet make use of the content of RTP or RTCP packets; instead it measures the start and end times of participation in a session based on source host addresses. A number of interesting observations of user behavior, such as "session surfing" and significant variations in connection patterns for different types of sessions, were observed. Kevin posed as an open issue the question of how to take advantage of the information carried by RTCP.
2.2. Using RTCP feedback
Andrew Swan responded to a last-minute request to say a few words about work at UC Berkeley on using RTCP. He pointed out that many interesting events have been transmitted on the MBone, but that we still need to be able to better diagnose problems in the multicast distribution. RTCP feedback will be an important part of the solution.
Collecting feedback with RTCP is easy. The hard part is figuring out how to analyze and present the information. The first idea, for example in presenting the loss rates seen by all receivers, is a table. But how should the table be sorted? Work is underway to implement and test different techniques, and will be presented in more detail in the future.
3. RTP header compression for low bandwidth
Internet Telephony is a rapidly growing
application, but the commercial products do not use RTP. Part
of the reason may be that they were developed before RTP was published
as an RFC, but another factor may be the bandwidth overhead imposed
by RTP. Even though minimizing overhead was a key consideration
in the design of RTP, for very low rate audio the 12-byte RTP
header may still be a problem.
Scott Petrack of IBM and Ed Ellesson of
IBM have proposed that the working group define "C/RTP",
a compressed-header form of RTP. Before the meeting, they sent
a preliminary draft to the mailing list outlining a framework
for C/RTP including examples of compression techniques that might
be employed. Scott gave a presentation on these ideas at the meeting.
To motivate this effort, Scott provided
a calculation showing that latency due to packetization delay
increases linearly with packet header size when bandwidth is constrained.
That is because one must increase the packet size if the header
size is increased in order to keep the overhead ratio constant
and not exceed the fixed bandwidth limit. Since highly compressed
audio signals may use frames on the order of 24 bytes, the 12
bytes of RTP header (plus 28 bytes of IPv4 and UDP headers) are
a significant overhead. Frames are generated every 20 or 30 milliseconds,
and to minimize latency it is best to send only one frame per
A variety of techniques for compressing
the header size were proposed. Constant information such as the
payload type and SSRC ID may be omitted if shared state can be
established via some form of reliable communication (out of band).
However, note that these values are constant only if the payload
format is not changed and if there is only one source sending
on the session (e.g., unicast). Some fields that are not constant
may change by fixed amounts for contiguous packets, depending
upon the media. It seems likely that the techniques designed by
Van Jacobson for compression of IP/TCP headers over SLIP links
may be applicable here, though some additional mechanism would
be required to take the place of the TCP retransmissions that
re-establish state after an error.
One issue is whether the compressed RTP
should be implemented in applications (as RTP often is) and used
end-to-end, or whether it should be implemented at the endpoints
of slow links, perhaps in PPP as with IP/TCP header compression.
It seems likely that for RTP header compression to be effective,
IP and UDP headers must also be compressed, which suggests the
latter approach. However, Mikael Degermark cautioned against tying
compression of all three layers into one mechanism if RTP compression
depends upon handshaking between the two ends. The UDP header
compression that has been designed as part of the IPNG effort
works over simplex connections without a handshake.
Scott's presentation did not include a specific
proposal for the C/RTP protocol, but rather was a call for the
working group to take up this problem as a work item. The general
sentiment at the meeting was that this would be appropriate. Comments
pro or con are solicited.
Scott also offered a GSMVQ audio encoding
algorithm for consideration as a format to be used with RTP.
4. Fostering industry adoption of RTP
The first topic that was raised in calling
for the working group to meet at this IETF was a discussion of
what should be done to foster industry adoption of RTP. However,
there was not sufficient time to organize participation in the
meeting by companies other than those who have been involved in
the working group for some time, so there was not a lot of discussion
here. Others cited adoption of RTP in Netscape's LiveMedia initiative
and in the H.323 recommendation of the ITU-T as evidence that
the industry is already paying attention to RTP. None the less,
there may be issues such as the need for header compression that
are industry concerns. It was suggested to try organizing for
more industry participation at the next IETF in Montreal.
One proposal was to sponsor a Connectathon-like
event to test interoperation and conformance. Ross Finlayson countered
that it should not be necessary to organize an event in one place
for this. Instead, interoperability testing should be conducted
across the network.
5. AVT working group logistics
It became apparent at this meeting that
there is new work that should be addressed by the AVT working
group, and that AVT should continue to hold separate sessions
at IETF rather than merging with MMUSIC. In particular, it was
agreed that AVT should meet in Montreal. The new work includes:
On this last topic, during IETF week a new
draft on "RTP extension for Scalable Reliable Multicast"
(draft-parnes-avt-srm-00.txt) was posted to the mailing list by
Peter Parnes from LuTH/CDT. Everyone is encouraged to read that
Since AVT's initial charter has been completed,
this new work needs an updated charter. The chair will prepare
a revised charter for comments.
6. Miscellaneous issues
Steve Casner brought up one last issue before
adjourning the meeting. In Section 8.2 of the RTP spec, an algorithm
is defined for detecting collisions of SSRC ID allocations and
loops induced by RTP mixers or translators. As defined, that algorithm
will work properly only if sources use the same UDP source port
number (not destination port number) for both RTP and RTCP packets
in a session. However, not all implementations of RTP do that;
the vat and vic programs from LBL are counter examples. The algorithm
can be modified to allow the ports to be different, at the cost
of potentially locking on to the RTP packets from one source and
the RTCP packets from another. On the other hand, some people
have expressed the opinion that the algorithm is too complicated
and hard to test (because collisions should be very rare). This
is one area in which implementation experience is needed to determine
what changes, if any, should be made before the transition of
RTP to Draft Standard.
Due to time limits, there was no discussion of this topic at the meeting. Steve Casner intends to write an informational RFC on the issue so that implementers of RTP can be aware of it.