Editor's note:  These minutes have not been edited.
 
Minutes of the Audio/Video Transport Working Group

Reported by Steve Casner

1.  Introduction and status

The AVT working group met for two sessions at the San Jose '96 IETF.
AVT has produced the Real-time Transport Protocol, which was published
in January 1996 as a Proposed Standard RFC 1889 along with the
companion RTP profile for audio/video conferencing RFC 1890.  The
minimum interval before advancement to Draft Standard has now passed;
during that time, RTP use has grown with additional independent
implementations and a broader range of applications.  Accordingly, at
this meeting we discussed what changes to the spec may be required for
advancement to Draft Standard.  In particular, the biggest topic was
RTCP scalability for very large sessions, especially with asymmetric
or low data rate links.  Also of significance for low-speed links was
the second major topic of this meeting, an update on the IP/UDP/RTP
header compression protocol.

Since the last meeting, RFCs 2029, 2032, 2035 and 2038 defining the
RTP payload formats for CellB, H.261, JPEG and MPEG video encodings
were also published as Proposed Standards, in October 1996.  In the
second AVT session, a problem with RFC 2038 regarding packet loss
resilience for MPEG2 was presented, as were RTP Payload Formats for
Redundant Audio, H.728 audio and H.263 video.

A status report on development of an RTP MIB was presented; this is a
potential future work item for the group, along with completion of the
changes for transition of the RTP spec to Draft Standard and
standardization of additional profiles and payload formats.


2.  RTCP scaling for large "broadcast" sessions

Before this IETF meeting, there were several messages to the AVT
mailing list regarding use of RTCP in very large "broadcast"
scenarios, especially those where endpoints are connected by
low-speed modems or asymmetric cable systems with low-speed uplinks.
Bernard Aboba and Henning Schulzrinne gave presentations on the
problems that arise along with a variety of possible solutions.  

The interval between RTCP packets increases with the group size so
that RTCP consumes a constant fraction of the session bandwidth.
This keeps RTP well behaved as it scales to large groups.  However,
there are scenarios for which the current spec is inadequate:

  - Because the initial RTCP report is sent within a fixed interval
    after joining the session independent of the group size, there
    can be a large bandwidth spike if many participants join at the
    same time, e.g., for a scheduled event.  Convergence to the
    proper report interval may take too long, in particular for
    low-bandwidth sessions.

  - For sessions larger than a few thousand, the long RTCP interval
    may preclude collection of the desired information.

  - The memory required by the suggested RTCP interval calculation
    algorithm to track all the participant SSRC identifiers may be
    excessive for very large sessions.

  - For some types of sessions, privacy of reception may be more
    important than the feedback RTCP provides.

  - On low-speed modems, giving even 5% of the bandwidth to RTCP
    draws cries from the people who have to squeeze the data to fit.

  - Network operators are concerned about the multicast routing state
    required to track all the participants as multicast sources.

Several potential solutions for these problems were presented and
evolved in the discussion that followed:

  - The initial bandwidth spike can be avoided if the report interval
    is "reconsidered" before sending a report.  That is, sending is
    delayed until the updated group size estimate at the end of the
    interval is not much larger than it was at the start.  Group size
    estimation can also be accelerated by having each participant send
    the estimate it has counted, but use the maximum of all estimates
    received to calculate the interval.

  - Memory requirements may be reduced by storing only a sampling of
    the SSRC identifiers heard, based on a probability function.  The
    probability can be scaled as needed to fit the group size.

  - The report interval can be reduced by having only some of the
    participants report, perhaps controlled by a sampling function
    from the sender, if partial feedback is appropriate.

  - Receiver RTCP reports could be sent via unicast to the sender or
    some monitor, or sent on a separate multicast address, but the
    report interval must be controlled by some external means since
    the receiver can't calculate the interval itself.  There is a
    danger that if receivers allow some other entity to control their
    report rate and destination, this could be subverted to packet
    bomb some innocent party.

  - It is always safe (from a network load point of view) if
    receive-only participants do not send RTCP.  This avoids several
    of the problems listed, but precludes feedback.  The development
    of diagnostic tools such as mtrace makes RTCP less critical for
    network diagnosis than before.

  - Report bandwidth can be reduced by summarization of reports by
    selected receivers through self-organization using TTL scoping, or
    by RTP translators explicitly interposed in the distribution tree.
    This adds significant complication.

Many good points were brought up in the discussion of these possible
solutions:

  - Using a distributed consensus algorithm driven by multicast is
    robust because a single defector can't inflate the rate.

  - Encryption can be used for privacy, though this can add legal
    complications in some countries.

  - The current RTCP algorithm represents a design point that we
    believe works well for a wide range of applications and which has
    been tested in experiments on the MBone.  But there may be changes
    needed to extend operation to highly asymmetric networks or other
    scenarios outside that range.

Each of the potential solutions has some drawbacks, and there is not
one solution that is the clear answer in all cases.  Further
experimentation is needed to test the proposed ideas.  However, it was
agreed that the wording in the main RTP spec should be modified to
allow more flexibility in how RTCP may be used, and that the means by
which different modes should be defined and selected is through the
specification of (a small number of) additional RTP Profiles.  These
profiles might be specified in the description of a session, e.g., in
SDP using "RTP/AVB" ("B" for broadcast) rather than "RTP/AVP".

Proposals are solicited in the form of Internet Drafts defining
additional profiles to be the subject of experimentation and then
consideration for standardization.


3.  IP/UDP/RTP header compression

Steve Casner presented an update on the proposal developed with Van
Jacobson for hop-by-hop compression of IP/UDP/RTP headers to allow
use over low-speed lines.  This proposal was introduced at the
previous AVT meeting in Montreal.  The update incorporates changes
agreed in Montreal and subsequent improvements:

  - The context ID is always sent, and in the first byte to allow
    overlap with a link-level segmentation scheme when feasible.

  - The sequence number is increased to 4 bits and is changed to be
    context-specific rather than global.

  - A new delta encoding scheme was designed to fit the patterns
    encountered in typical RTP applications, replacing the temporary
    use of the scheme from RFC 1144 TCP/IP header compression.

  - RTP header compression is now integrated into the IPv6 header
    compression scheme (which also supports IPv4 and encapsulation in
    multiple headers).

  - It is now specified that RTCP packets will not be compressed,
    since the traffic fraction is small and the required increase in
    shared context would be impractical.

Manoj Leelanivas presented a report on his implementation of the
header compression algorithm and its performance.  He emphasized that
since the identification of which UDP streams carry RTP is only though
heuristics, it is essential to have a negative cache for UDP streams
that don't compress; otherwise, the context cache will thrash as each
new packet may fail to match any existing context.  The compression
scheme performed as expected: most audio packet headers compressed to
2 bytes without UDP checksum or 4 with; most video packet headers
compressed to 4 or 6 bytes.

During discussion of the proposal, it was agreed that the delta
encoding should be specified to be table driven, with the default
table being as presented but with holes in the encoding space used to
encode negative deltas that may occur with MPEG or out-of-order
packets.  Also, for use of header compression over higher-speed lines
where there may be a larger number of contexts, either an 8- or 16-bit
context ID should be allowed via negotiation.  The working group
agreed that this proposal should go to Last Call with these revisions
in place.

Steve Casner also made a short presentation in the PPPEXT working
group meeting to coordinate the allocation of PPP packet types that
will be required for this compression scheme.


4.  Transport aspects of RTSP Proposal

The Real-Time Streaming Protocol is under discussion primarily in the
MMUSIC working group, but also has some transport aspects that were
discussed briefly in AVT.  The RTSP proposal has included a
reduced-sized variant of RTP that was termed "compressed RTP".  Since
the AVT working group agreed in Montreal that the combined IP/UDP/RTP
compression scheme described in the previous section should be
standardized and that an end-to-end RTP-only compression scheme should
not, the RTSP authors will extract the definition of RTP variant into
a separate non-standards-track interim proposal named CUSH, for
Compressed UDP Stream Header.  In the main RTSP specification, the
means of specifying the underlying stream transport protocol will be
made more flexible to provide better separation between control and
transport.


5.  RTP payload formats
 
The second AVT session covered topics beyond the main RTP spec,
primarily additional payload formats plus some potential new
application areas for RTP.

5.1  Redundant audio payload format

Colin Perkins gave an update on draft-perkins-rtp-redundancy-01.txt
which defines a mechanism for carrying redundant audio formats in RTP
to compensate for packet loss.  The redundant copy is usually more
heavily compressed to reduce overhead.  This updated draft includes
modifications suggested at the Montreal meeting.  Two interoperating
implementations are in regular use.  The working group agreed that
this proposal should go to Last Call after a few minor edits.

5.2  Payload format for H.263 video

Chad Zhu presented revisions to the payload format for H.263 video in
draft-ietf-avt-rtp-payload-02.txt.  These revisions reflect changes in
the ITU H.263 spec in addition to suggested clarifications and other
minor changes from the AVT group.  One change, prompted by the
development of H.263+, was to add a version number in the payload
format.  This change generated some discussion because it introduces
another multiplexing point in the processing of each data packet.  The
group concluded that it would be preferable to design the payload
format to accommodate the expected changes as well as possible, and
then assign a new payload type if an incompatibility arises.  The
H.263 payload format spec should also be ready for Last Call after the
version number is removed.

5.3  Problems with MPEG2 payload format

The payload format spec for MPEG1 and MPEG2 was recently published as
Proposed Standard RFC 2038.  Reha Civanlar has identified a problem
with this spec in that the packet loss resilience information is
insufficient for MPEG2.  He presented two recommendations for
supplying the necessary information:

  - An optional second payload-specific header word would be added to
    carry the additional picture layer information required, and
    redundant sequence and GOP headers would be transmitted
    periodically as needed to achieve the desired loss probability.
    The overhead would be less than 1% for a 4 Mbps data rate assuming
    2 sequence/GOP header retransmissions per second.

  - To reduce the redundancy overhead, the "high priority" header
    information could be sent using a "reliable" protocol prior to the
    main transmission of the video data using RTP.  Since inclusion of
    the additional redundancy information in the payload header is
    optional, use of this method does not require changes to RTP.

The group accepted the proposed extensions, and the chair asked Reha
to write up the proposed change for inclusion into the MPEG payload
format specification before its transition to Draft Standard.

Reha also proposed an alternate format for bundled audio + video MPEG
payload format described in draft-civanlar-bmpeg-00.txt.  This format
would give reduced overhead and better error resilience compared to
the encapsulation defined in RFC 2038 for MPEG1 Systems or MPEG2
Transport streams.  However, such improvements were considered in an
early draft of RFC 2038 but were discarded because applications that
have data in the Systems or Transport streams formats and can't afford
the data handling required to take advantage of the benefits of
separate audio and video Elementary streams also could not afford to
use the more efficient bundled format.  Reha's proposal is put forth
for comparison testing and will be considered as an alternate format.

5.4  Payload format for G.728 audio

Ofer Shapiro has submitted text to be added to the RTP Audio/Video
Profile to describe the payload format for G.728 audio.  No separate
payload format spec is needed since only a few paragraphs are needed
to describe the format.  The four 10-bit vectors per audio frame are
simply packed into 5 bytes, MSB first.  Multiple frames may be packed
into each packet.  This format has been accepted by the ITU study
group covering the use of RTP in H.323.


6.  New applications of RTP

Several proposals for new applications of RTP have been recently
submitted to the working group.  Two of them, a payload format for
carrying HTTP over RTP in draft-aboba-rtp-http-01.txt, and a proposal
for adding Scalable Reliable Multicast mechanisms to RTP, described in
draft-parnes-rtp-ext-srm-01.txt, fall into the general area of
reliable multicast which the Transport Area Directors want to organize
as a separate IRTF Research Group and later IETF Working Groups.  The
group discussed this plan and the fit of new work into the AVT
charter, but discussion of these proposals was deferred until
organization of the reliable multicast research area is sorted out.

6.1  Using RTP with caching

Two presentations on new applications were made at this meeting.  The
first was by Roger Kermode on the idea of layering audio and video
streams in time as well as quality, then combining this layering with
"proactive" caching to reduce latency and bandwidth requirements for
on-demand playback.  Multiple RTP streams (layers) from a particular
video subject would be used for the different access phases implied by
(near) on-demand access as well as for multiple quality levels as
desired by different receivers.  Receivers would combine multiple
streams from caches and original sources with local storage to produce
the desired quality of playback at the desired time.

This work is at the research stage, but is likely to draw on RTP
(possibly with SRM extensions), RTCP, RTSP and SDP as building
blocks.

6.2  Aggregation Service within RTP

The second presentation was by Jonathan Rosenberg on multiplexing
several RTP audio sessions into one packet stream.  This could be
used, for example, to increase packet efficiency substantially between
Internet Telephony gateways.  Such gateways have recently been
developed and deployed to allow long-distance telephone callers to
dial a local gateway which then establishes an RTP stream to another
gateway near the desired destination and instructs the remote gateway
to dial the callee's telephone.  There may be many parallel streams
of small packets between these gateways; aggregating these streams
into larger packets conserves header overhead without increasing delay
as would be the case for large packets in a single stream.

Jonathan presented various methods for assigning individual streams to
logical channels in the aggregate and for carrying the information
that must remain specific to each stream.  The variations trade off
efficiency and the extent to which interpretation of fields in the RTP
header must change.  Full details of the options and efficiency
results are given in draft-rosenberg-itg-00.txt and .ps.  Further work
is needed before deciding whether some standardization should be
undertaken.


7.  Status report on RTP MIB

Stan Naudus gave a status report on the development of an RTP MIB.
While RTCP provides an efficient and scalable means to obtain feedback
from end systems in a multicast session, it does not provide third
party management of unicast sessions such as in Internet Telephony.
There may also be a need for additional remote control of RTP mixers
and translators beyond what would be practical to implement with RTCP.

The challenge in designing a MIB for RTP is to make it scalable over
the wide variety of applications in which RTP might be used.  The
design currently includes 8 tables and 62 objects.  Greg Minshall
questioned whether 62 was too many objects for a practical MIB.  The
goal of the authors is to continue work on the MIB, including
implementation testing to assess practicality, and submit it for AVT
consideration at the next meeting in Memphis.


8.  Advancing RTP to Draft Standard

As noted in the introduction, it is time to advance the RTP spec, RFC
1889, and the RTP Profile, RFC 1890 to Draft Standard.  Steve Casner
presented a list of (potential) changes to be addressed for
advancement:

  - changes for RTCP scaling as described earlier
  - rule changes proposed for layered encodings as described in
    draft-speer-avt-layered-video-01.txt 
  - revision of the loop/collision algorithm for separate RTP and RTCP
    source port numbers
  - some clarifications of the wording and small changes such as
    allowing separate unicast ports that have already been made in the
    spec source files
  - allowing separate multicast addresses for RTP/RTCP

These items are all straightforward or were previously discussed
except for the last.  Should the specification of an RTP session be
generalized to allow different addresses to be used for RTP and RTCP?
This would allow RTCP delivery to be constrained for improved
scalability in some scenarios.  After some discussion, the consensus
of the group was that we should take the conservative approach and not
make this change at this time.

In addition to any required revisions of the RTP specification,
advancing to Draft Standard requires evidence that interoperability
requirements have been met.  This should be easy to provide since
there are several genetically distinct implementations of RTP, both
research and commercial, in use.  The exact form required for the
evidence has not been determined by the Area Directors yet, but it is
likely that an applicability document will be needed to list the areas
in which RTP has been successfully used and the areas to which we
believe it can be extended.  We may also be required to document the
scalability of RTP in a manner similar to that proposed by the Area
Directors as a requirement for reliable multicast protocol proposals.


9.  Advancing RTP Profile to Draft Standard

Since the RTP Profile is a simpler document than the main RTP spec,
there are fewer issues for its advancement.  One question of
significance is whether the Profile should restrict the format of the
RTCP SDES CNAME item beyond the suggestions of the RTP spec so that
separate implementations of audio and video tools will be more likely
to use a common format.  This commonality is important to allow audio
and video streams to be associated for synchronization and other
presentation management.  The group agreed that the profile should
recommend use of only the numeric address form of the CNAME.

In addition to this change, there are several of the simple audio
payload formats included in the Profile that need some additional
details specified, such as bit and byte order for packing.  Some
additional formats, such as G.723 and G.728, will also be added.


10.  Future work

The AVT meeting ended with a discussion of how much work remains,
since the original charter has been completed.  The group agreed to
meet again in Memphis, with the intention that work on new Profiles
for RTCP scaling and the new Payload Formats currently in progress
can be finished then.