2.6.1 Audio/Video Transport (avt)

NOTE: This charter is a snapshot of the 48th IETF Meeting in Pittsburgh, Pennsylvania. It may now be out-of-date. Last Modified: 17-Jul-00


Stephen Casner <casner@acm.org>
Colin Perkins <csp@isi.edu>

Transport Area Director(s):

Scott Bradner <sob@harvard.edu>
Allison Mankin <mankin@east.isi.edu>

Transport Area Advisor:

Allison Mankin <mankin@east.isi.edu>

Mailing Lists:

General Discussion:rem-conf@es.net
To Subscribe: rem-conf-request@es.net
Archive: ftp://ftp.es.net/pub/mail-archive/rem-conf/

Description of Working Group:

The Audio/Video Transport Working Group was formed to specify a protocol for real-time transmission of audio and video over UDP and IP multicast. This is the Real-time Transport Protocol, RTP, together with its associated profile for audio/video conferences and payload format documents.

The current goals of the working group are to revise the main RTP specification and the RTP profile ready for advancement to draft standard stage (including the sampling algorithms for use with very large groups, which have been broken out into a separate document), to complete the RTP MIB, to produce a guidelines document for future developers of payload formats and to continue development of new payload formats.

The payload formats currently under discussion include a number of media specific formats (MPEG-4, DTMF, PureVoice) and FEC techniques applicable to multiple formats (parity FEC, Reed-Solomon coding).

Goals and Milestones:

Feb 99


Post RTP implementation checklist draft

Feb 99


Working group last call on guidelines for payload format writers (BCP)

Feb 99


Post revised draft on PureVoice (qcelp) payload format to address WG last call comments

Feb 99


Post payload format for MPEG-4 based on MPEG/IETF joint meetings

Feb 99


Working group last call on parity FEC draft (standards track)

Feb 99


Post revised RTP MIB and issue working group last call (stds track)

Feb 99


Post revised DTMF payload format draft, ready for WG last call

Feb 99


Post revised RTP membership (SSRC) sampling draft

Feb 99


Post revised Reed-Solomon draft

Feb 99


Post revised RTP spec and audio/video profile

Feb 99


Post revised RTP spec and audio/video profile

Mar 99


Submit RTP MIB to IESG for publication as Proposed Standard RFC

Mar 99


New working group last call on PureVoice payload format

Mar 99


Submit guidelines for payload format writers for publication as a BCP

Apr 99


Working group last call on revised SSRC sampling draft (experimental)

Apr 99


Post final revision of RTP spec and A/V profile drafts

Apr 99


Analysis/simulation of multiplexing payload format proposals

Jun 99


Revise MPEG-4 payload format document after implementation experience

Jul 99


Prepare MPEG4 implementation results ready for WG last call

Jul 99


Decide how to proceed with multiplexing protocol: one generic payload format or a number of application specific formats

Jul 99


Working group last call on RTP and A/V profile (for Draft Standard)

Oct 99


Post final revisions of selected multiplexing protocol draft(s)

Nov 99


Working group last call on multiplexing payload format (stds track)


Request For Comments:







RTP: A Transport Protocol for Real-Time Applications



RTP Profile for Audio and Video Conferences with Minimal Control



RTP payload format for H.261 video streams



RTP Payload Format of Sun's CellB Video Encoding



RTP Payload Format for H.263 Video Streams



RTP Payload for Redundant Audio Data



RTP Payload Format for MPEG1/MPEG2 Video



RTP Payload Format for Bundled MPEG



Options for Repair of Streaming Media



RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)



RTP Payload Format for BT.656 Video Encoding



RTP Payload Format for JPEG-compressed Video



Compressing IP/UDP/RTP Headers for Low-Speed Serial Links



An RTP Payload Format for Generic Forward Error Correction



Guidelines for Writers of RTP Payload Format Specifications



Sampling of the Group Membership in RTP



RTP Payload for Text Conversation



RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals



RTP Payload Format for Real-Time Pointers

Current Meeting Report

Minutes of the Audio/Video Transport working group

Reported by Colin Perkins and Stephen Casner.

The audio/video transport working group met twice at the 48th IETF in Pittsburgh. In the first session we discussed IP encapsulation, header compression, RTCP extensions, forward error correction, and MP3 audio. The second session was for discussion of the payload formats for AMR, SMPTE 292M, DSR, SONET over IP, and MPEG-4.

The meeting opened with an introduction and status update by Steve Casner. A number of drafts have been published as RFCs since the last meeting:

The RTP MIB, MIME type registration for the parity FEC payload format, and the payload formats for DV audio and video are awaiting publication. The payload format for G.722.1 is in working group last call, comments are solicited.

The working group last call on the revised RTP specification and audio/video profile completed in November 1999, with an agreed set of edits. These edits have now been completed. However, the issue of congestion control was raised: other protocols are required to have congestion control, why not RTP? Following on from discussion in Adelaide, a number of changes have been made to the RTP specification and profile to address this concern: the RTP specification has text to note that congestion control is required, but that the requirements are different to TCP and are somewhat context dependent, the details of the required congestion control SHOULD be specified in profiles. The audio/video profile has text from Mark Handley which notes that:

Comments on whether the text in the drafts is sufficient, acceptable and appropriate are solicited.

There were several comments from meeting participants. Some felt this text was not specific enough, especially for the multicast case, while others felt it was too early to specify the details of what could be done without further experimentation. Perhaps that means this text is vague to the appropriate degree? In discussion on the mailing list shortly after the meeting, the primary concern was that the user may want an RTP stream to get more than its share of the bandwidth of a narrow link near the user. It is expected that QoS support should be used in that case so that appropriate service can be requested, as in the first case listed above.

The revised profile also includes changes received from the ITU for G.729 (new annexes), G.723.1 (diagrams). The latest revisions are available as draft-ietf-avt-rtp-new-08.txt and draft-ietf-avt-profile-new-09.txt.

The chairs also noted that we have still to complete the RTP interoperability statement. A number of contact people were noted, and a meeting was arranged between the sessions to discuss this subject, and to assign work items (see below). Help from all implementors would be appreciated. The current drafts are draft-ietf-avt-rtp-interop-03.txt and draft-ietf-avt-profile-interop-01.txt.

There are also a number of drafts which accompany the RTP specification:

These drafts are believed to be complete. Anyone who has concerns about these drafts is urged to send comments to the mailing list.

Stephan Wenger made a brief presentation on the applicability of RFC 2429 to the next revision of H.263 -- known as H.263++ -- which is in progress. This revision adds a number of annexes to H.263:

Of these, annex U does not affect the RTP packetization. Annex V does not affect the packetization so long as each slice is split into a minimum of two RTP packets (header plus motion vectors, and up to 8 bits of the DCT info in one packet; rest of the DCT in another). Annex W has many new features, the only one relevant to the RTP packetization being header repetition. This can be supported by including the repeated header in the current picture header, so long as the header is smaller than the maximum size imposed by RFC 2429. The appendix on profiles and levels is intended to ensure interoperability without very extensive capability negotiation, these profile and levels are referenced by MIME parameters. In summary, no technical changes are required to RFC 2429 to support H.263++ although some additional explanatory text on how to use the new features may be useful. It was suggested that this text could be added in an appendix to the payload format spec; Stephan said this would be done for the next meeting.

Steve Casner noted that MIME subtype video/h263-2000 was added to the draft-ietf-avt-rtp-mime-03.txt spec for H.263++, so there are two MIME subtypes referencing RFC 2429, but this does not introduce an incompatibility since old implementations would not understand the new annexes.

Tmima Koren presented enhancements to IP/UDP/RTP Header Compression (draft-ietf-avt-crtp-enhance-00.txt). There are three drafts which depend on these enhancements, two presented in AVT and a third on voice over MPLS. The change since the previous meeting is an additional section on bandwidth efficiency. Feedback is solicited on whether all, some, or none of these enhancements should be included in a revision of RFC 2508 for Draft Standard, especially from implementors of 2508 as to whether they would plan to implement these enhancements.

Tmima also presented IP/UDP/RTP Header Compression over AAL2 (draft-buffam-avt-crtp-over-aal2-00.txt). This is an alternative to the CRTP/PPP/PPPMUX/AAL5 multiplex (draft-ietf-avt-tcrtp-01.txt) which AVT working group has agreed upon as the recommendation for multiplexing of RTP streams. The AAL2 method has comparable overheads for small voice packets but is more robust to cell loss. This is outside the scope of the AVT working group since what is required is a new ATM AAL2 SCCS identifier, but was presented here so the WG would be aware of this work. There was a comment that this falls into ITU-T Study Group 13 Question 5. A second comment was that this scheme would work for other RTP header compression schemes, such as ROCCO.

Mooi Chuah presented a Light Weight IP Encapsulation (LIPE) Scheme (draft-chuah-avt-lipe-01.txt). This is another multiplexing proposal aimed at reducing the bandwidth used and processing load (compared with the header compression/tunneling in the TCRTP framework) at the expense of reduced functionality. Although presented as a means of transporting voice frames, this work is actually a general purpose layer 2 multiplex based on a new transport protocol as an alternative to RTP. This is outside the scope of the AVT working group charter, so it would require a new charter or new working group. There was a comment that introducing a new transport protocol, with a new IP protocol number, is a step that must considered carefully.

Timur Friedman presented some RTCP reporting extensions (draft-friedman-avt-rtcp-report-extns-01.txt). These allow reporting of considerably more detailed statistics than the standard SR/RR packets: loss bitmaps (RLE encoded), receiver timestamp, and delay since last receiver report. These last two allow receivers to calculate the round trip time to senders -- it was suggested that allowing receivers to send SR packets with the packet and octet count fields set to zero would have similar effect. The major open issue is the use extended SR/RR blocks (the current approach) compared to the definition of new packet types. The authors are of the opinion that defining new RTCP packet types would be cleaner, and the consensus of the room agreed with them.

The next section of the meeting was devoted to discussion of RTCP backchannel proposals. Colin Perkins introduced this by noting the issues to be considered during the presentations:

The first RTCP backchannel proposal (draft-ietf-avt-rtprx-00.txt) was an RTP profile for RTCP-based retransmission request for unicast sessions presented by Koichi Yano. The changes made since the previous meeting include additions to the NACK packet format:

There are also a number of changes to the sender and receiver behavior:

Finally, a brief explanation of the changes needed to SDP and discussion of deployment with existing payload types was made. There have been no changes to the RTCP interval recommendation.

A number of issues were raised for discussion: On congestion control it was noted that although the draft describes how to calculate metrics like loss or RTT, it does not propose and particular scheme, should it? The author observed that it may not be possible to specify a single scheme because congestion control is dependent upon the payload and the environment. Also, are NACKs sufficient or might extended RR packets be needed also? RR is needed for sender RTT calculation, so rather than duplicating this functionality in the NACK packet, the author prefers to use both NACK and RR. What about use of this format for other purposes (FIR, NEWPRED)? The requirements for the RTCP interval should be similar, but should a new RTCP (sub-)type be defined? Should this profile be more general? This is to be resolved in discussions among the authors of these drafts.

It was asked if the authors had considered including information in the NACK packets to indicate the length of time for which a retransmission is useful? Yes, but the desire to keep the protocol simple was thought to outweigh the advantages of this (and Dave Singer noted that so long as the server is consistent, the receiver can figure out if the retransmission will be received in time and use the R bit appropriately). How big a fraction of the RTP bandwidth is used with small packets? The authors discuss use with 1 kByte packets, but audio packets are a lot smaller than this. The fraction would be higher with small packets, but Steve Casner noted that it may be acceptable (in a unicast scenario) to use more control bandwidth if that is the appropriate balance for the application. He also noted that we would like to have a single profile for retransmission, with multiple payload formats as needed. That profile should at least be able to specify the RTCP timing in common for all the payload formats, and if possible, common congestion control mechanisms as well, though perhaps with different RTT calculation methods. The profile may support multicast as well as unicast. Koichi countered that we know how to do unicast now, but multicast is not so clear, so it may have to be addressed later.

Roger Kermode noted that this work, and that in the other RTCP backchannel proposals, overlaps with that being done in the reliable multicast transport (RMT) working group and that coordination between the groups would be beneficial. The AVT chairs agree, and urge the authors of the backchannel drafts to study the work ongoing in RMT.

The next presentation (draft-miyazaki-avt-rtp-selret-01.txt) was by Carsten Burmeister: an RTP Payload Format for Multiple Selective Retransmissions. It was noted that there patent applications exist which may pertain to this draft - an IPR statement has been received, and will be lodged on the IETF web site in the usual manner. This proposal is a payload format which would fit well with the profile specified in the previous proposal. This work allows for multiple and selective retransmissions in an attempt to solve the following problems: how does the receiver know if a lost packet should be retransmitted (since not all packets are of equal importance)? How does the receiver detect a lost retransmitted packet?

To solve these problems, the concept of a second sequence number (SSN) is introduced. At the sender each packet is marked with an SSN-indicator and SSN. If the packet is `important' and should be retransmitted if lost, the SSN-indicator is set to one and the SSN is incremented; otherwise the SSN-indicator is set to zero and the previous value of the SSN is included in the packet. On reception, the receiver compares the SSN in the packet with the stored value from the previous packet: if it is the same, nothing important has been lost. If it equals the previous value plus one, and the SSN-indicator is one, an important packet has been received correctly. If it equals the previous value plus one, and the SSN-indicator is zero, an important packet - with SSN equal to that in the just received packet - was lost and a retransmission request should be made. The claimed advantages of this approach are selective retransmission (only important packets are retransmitted) and support for multiple retransmissions (retransmitted packets get a new SSN value, the receiver is able to detect lost retransmissions).

It was asked what happens if the NACK packet is lost, is there a timeout to retransmit it? No, if the receiver gets other packets with a SSN, it knows the packet is still lost and can retransmit the NACK. How about using ACK which is more reliable if the error rate is high. Yes, but ACK takes more bandwidth.

Stephan Wenger noted that there are only two priority groups here, and it is not easily enhanced to more than two groups. Maybe more than two levels are needed? The authors believe there are only two meaningful priorities: retransmit or don't. Steve Casner said that if there are several packets lost and a limited retransmission budget it may make sense to choose to send the most important vs. medium importance packets. He also noted that it may be possible to achieve the same effect using layering by sending "important" packets on a different port, with a separate sequence number space, and that this might avoid the need for the second sequence number.

Next, Stephan Wenger presented RTCP-based feedback for predictive video coding (draft-wenger-avt-rtcp-feedback-00.txt). This draft discusses why feedback is needed for predictive video coding, and how to provide feedback in a timely manner for both multicast and unicast sessions. The motivation for support of multicast and unicast together is to avoid specialized solutions wherever possible, and to provide a solution which degrades smoothly as the number of members increases. This is "the RTP way".

The draft defines a number of feedback message types: picture loss indication, slice loss indication and reference picture selection indication. The choice of when to send feedback is based on:

The result is a formula which allows for sending after a random interval which is a function of the group size, retaining timely feedback for unicast sessions. Feedback requests inhibit other receivers, preventing implosion.

There are several open issues on which comments from the WG are solicited: Do we want to use feedback suppression, as in SRM, or do we want feedback from multiple receivers to know how bad the loss problem is? The spec currently defines feedback only for video, but it should work for other media as well, should we do that? The proposal currently specifies that the feedback message is always combined with RTCP RR; should we do that? Or should the RTCP bandwidth budget be split between the two, maybe 50/50?

Steve Casner asked if the needed RTT calculation is possible with the delay mechanisms available? Yes, pretty sure this is true. It was asked if the draft assumes real time encoding, rather than streaming from a file where the buffer can be bigger? Yes, this is mostly intended for real-time. Henning Schulzrinne noted that may be quite useful for one of the recurring discussions in the SIP working group regarding DTMF tones, where there are different perceptions on whether FEC or retransmission is appropriate. analysis to see which is better. Does this assume only one packet loss per interval since the second packet cannot be NACKed? Yes. Jörg Ott replied that in unicast multiple NACKs are allowed still keeping within the bandwidth limit, but in multicast they are not. That is the tradeoff for scaling, and another receiver will likely send the needed NACK anyway. Simulations showed this works well unless the loss is as high as 10%, and congestion control rules may say you should cease transmission above that point.

Finally in this section of the meeting, was a presentation by Shigeru Fukunaga on low delay RTCP for backward messages (draft-fukunaga-low-delay-rtcp-00.txt). This is the RTCP format for NEWPRED backchannel messages which was extracted from draft-ietf-avt-rtp-mpeg4-es-02.txt. The main features are as follows:

A number of issues were noted for discussion: the features of this proposal are similar to Koichi Yano's proposal - e.g. low delay, unicast - and it is possible to use that proposal as a profile of low delay RTCP after some modifications (define NEWPRED RTCP under RTP-RX profile). What about Stephan Wenger's proposal? Not considered low enough delay which is very important for the performance of NEWPRED; does not use ACK messages; and multicast may not be useful for NEWPRED. Stephan disputes the claim that very low delay is required, and reminded the list that he had recently sent a spreadsheet to the mailing list with his analysis. For a codec running at 10 frames per second, an extra 100ms delay only increases the size of the refresh frame from 2x to 2.5x the normal frame size.

Steve Casner pointed out this disagreement will not be settled in the AVT meeting; the answer must come from those doing the video compression work, which may mean from another body. If there is no single answer, it may mean we have to specify multiple mechanisms. The chairs asked the authors of the RTCP backchannel drafts to consider if they could be merged into a single specification or at least a common framework/profile that can work for all these schemes, since there is believed to be much overlap between the proposals. This needs to be done soon because of timing constraints for working with other bodies, so the authors were requested to meet later in the day and report back in the second AVT session. The authors agreed to discuss this in a separate meeting, and report back to the second session of the AVT meeting (see below). Ron Frederick commented that it should at least be possible to agree on a common RTCP NACK/ACK packet format, possibly for either unicast or multicast.

Ross Finlayson presented a more loss-tolerant RTP payload format for MP3 audio (draft-ietf-avt-rtp-mp3-02.txt). This format is a data preserving rearrangement of MP3 frames which removes backpointers making the result more loss tolerant, and optionally provides for interleaving. Changes since the previous draft are that each ADU is not preceeded by a descriptor giving its length, and the packing is no longer compatible with RFC 2250, since the 11 bit syncword is used for interleaving. It was already the case that a 2250 receiver could not process the MP3 payload format without the ADUs being rearranged, so all that is given up by this change is some potential for code re-use. The interleaving scheme provides for an explicit description of the ADU order, with an interleave index and cycle count. This differs from the general interleaving proposal submitted on the mailing list by Orion Hodson, which is not specific to a particular codec and gives an algorithmic description of the interleaving. There was some discussion over the merits of a general purpose interleaving format vs inclusion into particular formats. Ross claims that explicit interleaving is a benefit for MP3 because of the variable frame rate and variable number of frames per packet. He would also like to avoid a dependency on progress of Orion's proposal. Feedback on this would be appreciated.

Future steps include an update of the implementation to use the new ADU descriptor (and an open source code release), plus further tests of the interleaving. Ross has submitted a minor revision to the draft to change the number of bits in the interleave field. This payload format is thought to be ready for working group last call once the draft gets posted.

The second session started with a discussion of unequal error protection and FEC which was held over from the previous day due to lack of time.

The first presentation was by Adam Li on an RTP Payload Format for Generic FEC with Uneven Level Protection (draft-li-ulp-00.txt). It was noted that data in multimedia packets is not all of the same importance, and with some common encodings, the most important data is at the beginning of a packet. Hence this draft proposes a scheme for protecting just that data, and for layered protection (where the data earlier in the packet has more protection than that later), which is derived from the RFC 2733 parity FEC format.

It was asked how format independent this payload format is? The encoder needs to know the format of the media stream, the decoder does not, and it can be used for any format with the required properties. Jonathan Rosenberg asked a similar question: how useful is this? He is not so sure that many codecs have the most important information at the start of the packet (e.g. H.263 has the data spread throughout, although the H.263++ extensions change this). Defining a new generic payload format may not be worthwhile if it is just applicable to H.263++. Reha Civanlar suggested layered coding; both techniques require partitioning of the data, but layered coding allows differential network QoS as well. Concern was raised that there may be IPR issue with this work - clarification is sought.

Bernhard Wimmer presented draft-lnt-avt-uxp-00.txt, an RTP Payload Format for Erasure-Resilient Transmission of Progressive Multimedia Streams. The aims of this work are similar to those of the previous, except that an interleaved Reed-Solomon encoding is adopted (rather than parity FEC). Ron Frederick asked how much knowledge of the encoding scheme needs to be transmitted? How does one signal the amount of redundancy used per row? They have defined a scheme to signal how much redundancy is transmitted in each row. It is dynamic from frame to frame. Dave Singer asked about the effects of this encoding on congestion control: when multimedia data is a significant fraction of the traffic, increasing the bandwidth for redundancy coding just causes more loss. Has there been an analysis of what would happen if there were many streams using this scheme and filling a link? No. Colin Perkins pointed out that we in AVT need to address congestion control. Over and above the basic requirements on RTP, this payload format has a feedback channel which adds data in the presence of congestion. Allison Mankin said an option might be that this is dependent upon using Endpoint Congestion Management (ECM WG).

Since this draft was not offered in accordance with section 10 of RFC 2026, the chairs declined to progress it in the working group at this time. The authors were requested to clarify the IPR issues. Wimmer said a clarification would be sent to the mailing list.

The next area of discussion was the RTP payload format for the AMR speech codec. This was opened by discussion of the liaison statement received from ETSI SMG2 (now part of 3GPP) delivered by Peter Barany. The liaison statement raises a number of procedural questions for the group:

In addition, concern was expressed that the proposed payload format may not be able to deal gracefully with a feature of the AMR codec, where frames which have bit errors (for example, due to poor wireless reception) are marked as damaged, but still transported.

There are two aspects to this latter question: can damaged frames be delivered to an application, and can those frames be marked during later transit? The first issue is not one the AVT working group can resolve, and is rather for the lower layer protocol stack designers (e.g. the recent UDP-Lite proposal); the second can be solved by use of a `damaged frame' bit in the RTP payload header which is set when the payload contains errors (there are cross layer issues to resolve, since this bit must be set in transit, but again these are outside the scope of the AVT working group).

Allison Mankin said that the UDP-Lite proposal did not get much support from link layer folks, but maybe now this is evidence of a link that would support it. Allison suggested approaching the transport area directors about following up on this, probably not in AVT but possibly.

Ron Frederick noted that RTP translators and mixers would have to be aware of this `damaged frame' bit also, and that the interactions here are unclear (e.g. when mixing erroneous and non-erroneous data). It was also noted that AMR includes the existing GSM FR/EFR codecs as a subset, and the `damaged frame' indicator is needed there also, which may be an issue. Would these need to have revised payload formats which are indicated by dynamic payload types.

Following this introduction we had presentations on the two proposed payload formats for AMR. First was draft-sjoberg-avt-rtp-amr-01.txt presented by Johan Sjöberg. This draft is a merger of the Ericsson and Nokia drafts from the last meeting. The main open issue among the authors is that this format currently supports 4 different Comfort Noise types. We probably want to reduce this to just the one AMR-specific CN format, and switch payload types if other CN formats are needed. Steve Casner endorsed this approach.

The second payload format is draft-fingscheidt-avt-rtp-amr-00.txt together with a companion document that defines the MIME type for storage and RTP transport, which were presented by Bernhard Wimmer. This format provides for redundancy and parity FEC via a novel scheme. It was asked why existing standards (e.g. RFC 2198) could not be used. The claim is that the additional overhead of these more general purpose schemes is not acceptable. The issue is one of efficiency versus generality, and it is unclear if the trade- off selected for this draft is appropriate. In addition, the format defines its own frame type field: is was suggested that using RTP payload types here would be more appropriate.

Henning Schulzrinne asked about IPR for this codec and availability of reference code. The code is downloadable from 3GPP, both fixed point and soon floating point. Bernhard did not know the IPR status.

The chairs noted that these two payload formats need to be merged into a single proposal. This is the primary gating item for meeting the short timeline that was requested. Bernhard said the authors had a discussion before the meeting and should have a solution soon.

Ladan Gharai presented an RTP Payload Format for SMPTE 292M video (draft-ietf-avt-smpte292-video-00.txt). This is the serial digital interface used for uncompressed HDTV at a data rate of 1.485 Gb/s with source formats of SMPTE 260M, 295M, 274M and 296M. The payload format is relatively simple, although the high data rate makes for a couple of unusual features: the RTP sequence number is extended to 26 bits in the payload header, to give a wrap- around time of ~170 seconds (a 16 bit sequence number wraps in less than 1 second). Also, unlike other video formats, this one uses a 10MHz timestamp to allow for precise timing reconstruction of the serial bitstream.

Steve Casner asked if the format really needs the extended sequence number, since this may cause confusion with respect to the extended sequence number in RTCP. It may be that a one second wraparound time is not an issue, since the network can't buffer that many packets, or it may be feasible to infer the sequence number epoch from the timestamp. David Richardson noted that, whilst that may be so, the larger space may be needed to reference frames in other packets for error correction or retransmission, or for references between layers in layered coding. This is an area they are still working on.

Ladan also presented an AC3 audio payload format, draft-gharai-ac3-01.txt. AC3 is the audio format used with HDTV. There was not much change to this draft since it was last presented. The authors are seeking comments, but there are no specific open design issues.

Jeff Meunier presented draft-meunier-avt-rtp-dsr-00.txt, an RTP Payload Format for transport of ETSI ES 201 108 Distributed Speech Recognition streams. This is similar to sending compressed audio except that not all of the information required for playback is included, just what is necessary for speech recognition processing. Being new to RTP, the author solicited guidance on use of header compression, SDP/SIP usage and gotchas, robustness to packet loss and error concealment (in particular considerations for RTP over IP vs mobile IP), and packetization. In particular, the author is seeking public domain RTP implementations to help them get started.

It was noted that maybe we don't need to send the multiframe header which is defined in this draft, since it is currently constant. Rather, it may be possible to omit it from the stream and send the data out of band. There was no out-of-band channel when the ETSI spec was being written (for non-RTP channels). If you gateway to a non-RTP channel, the gateway would need to re-insert this info in-band. A second version of the ETSI standard may have time-changing parameters in this header.

Steve Casner noted that the current draft seems reasonable, but asked if the bandwidth savings (compared to using a normal codec) is worthwhile. Yes, in particular within a wireless environment, because this encoding is 4.8kbps vs 32kbps needed to do this with a normal audio codec of sufficient quality. The saved bandwidth is needed for other data exchange in parallel with the speech recognition.

It was asked how much support there is from speech recognition vendors to use this common format. They tend to be very guarded about their implementations, and they don't care much about interoperation. This proposal has gotten some support, but it is a chicken-and-egg problem between handset providers and application providers. There is agreement on this algorithm as the front-end.

The next presentation was preliminary work on an RTP Payload Format (http://www.tcb.net/plip/draft-white-sonet-format-rtp-00.txt) to carry SONET traffic. The motivation is to provide transport of legacy SONET data on IP networks where the lower layers do not support SONET, such as IP over photon or 10 gigabit ethernet. Although somewhat outside the traditional focus of the AVT working group, this appears to be a sensible application of RTP, in some ways related to the multiplexing proposals we have received in the past which seek to emulate a T1 circuit. The authors want to take advantage of the timing and sequencing functions of RTP in addition to more advanced functions like generic FEC. There may have been some confusion about the sequence number cycling back to 1 for the first packet of each frame, but it was explained that the sequence number must increment continuously across frame boundaries.

Ron Frederick explained why the M bit marks the last rather than the first packet of a frame for video, and that this choice might be helpful for similar reasons with this format. It was also asked to what extent does this format will preserve SONET SLA characteristics? It is not intended that SONET error recovery mechanisms will be supported. Only the SONET payload will be carried in RTP, and the overhead would be recreated at the far end. IP error recovery mechanisms (e.g., re-routing) would be used instead. It was also noted that a similar technique could be used to support transport of ATM over RTP, should the need arise.

The final area for discussion in the meeting was a framework document for the transport of MPEG-4 on IP (draft-singer-mpeg4-ip-00.txt), presented by Dave Singer. This is intended to provide a common framework for transport of any part - or all - of the MPEG-4 system on IP networks, to agree on that which can be agreed on, collect ideas and current practise, and is intended to be non-controversial. It refers to existing drafts and specifications only, explaining how they can be combined and what options exist.

In terms of RTP payload formats for MPEG-4, the framework document suggests that one, simple, base-level scheme ought to be available which can transport any MPEG-4 stream (possibly non-optimally). Any receiver ought to be able to receive this format, and it's good if senders can be persuaded to send it. However, the draft also endorses the development of other formats which can be optimized for particula media types. MPEG are working on the development of a generic format, to be based on draft-ietf-avt-rtp-mpeg4-03.txt, and we have a media specific format in draft-ietf-avt-rtp-mpeg4-es-02.txt being developed in the IETF.

Steve Casner noted that the chairs endorse this basic approach, and the consensus of the meeting was also that this was valid (although it was noted that some in the MPEG community would prefer a single format only). Several MPEG committee members were present. Zvi Lifshitz said he/they would take this approach back to MPEG to seek agreement. The timetable is an MPEG-over-IP Ad Hoc group meeting in September and a full MPEG committee meeting in October.

Guido Franceschini asked if we could resolve the choice between "application" and "video" MIME type for MPEG-4. Some people felt that video is appropriate because MPEG-4 is used primarily for visual presentations, but others counter that MPEG-4 can carry active content such as Java. Discussion continues on the mailing list.

Allison Mankin noted that this generality has caused considerable concern in the IESG regarding security due to the ability of MPEG-4 to transport active content (Java, ECMAscript, etc) and other potentially dangerous media. This aspect of any MPEG-4 payload format(s) will be reviewed thoroughly. In particular, in the view of the Area Directors the current Security Concerns section of the generic format (draft-ietf-avt-rtp-mpeg4-03.txt) is not sufficient. Allison asked that the next versions of the drafts posted for the WG have enhanced security sections so they may be reviewed. Colin asked for comments on all the drafts, both complaints and statements of support so we know which direction to take.

The discussion on RTCP backchannel messages in the first day's meeting led to a separate meeting of the authors of those drafts, to consider how they could be merged. Jörg Ott presented a summary of these discussions. It was agreed that there is a good set of aggregate functionality in these documents, but the authors couldn't agree on what should be in the baseline document. The major issue is whether to include support for multicast. Some desire two drafts: one purely for unicast, one which also supports multicast. It was noted that timing is important due to market pressure, and multicast is perceived by some to take longer consideration. Steve Casner noted that the chairs do not want to make an assumption that this will be two drafts from the start, we want to try to make a single document which supports both multicast and unicast. The authors agreed to try to merge the drafts, with a target completion date of December 2000 (last call after next IETF meeting).

Colin Perkins presented a summary of the discussions on completing the RTP interoperability matrix, which occurred between the two sessions. It is believed that we now have a good handle on the RTP protocol interoperability statement, with a number of implementors volunteering to provide input. We are less advanced with the profile interoperability statement, although some progress was also made there. Of particular importance is that implementors of the more advanced audio and video payload formats provide test results, since this is an area where we are short of data.

The meeting concluded with a recap of the action items and a call for acceptance of new work items. Those present agreed to that all of the following list of payload format proposals should be accepted as AVT work items and be resubmitted as AVT drafts: multiple selective retransmission, generic FEC with ULP, AC3 audio, DSR and SONET. Confirmation of this agreement must come from the mailing list, so anyone who believes any of these proposals are not appropriate for AVT work items is requested to comment on the mailing list.


Enhancements to CRTP
H.263++ versus RFC2429
Lightweight IP Encapsulation (LIPE) Scheme
MIME Type Registration of AMR Speech Codec
MPEG-4 on IP Framework
Low Delay RTCP Packet Format for Backward Messages
RTCP-based Feedback for Predictive Video Coding
RTCP Payload Format for MPEG-4 Backward Channel Messages
Introduction to RTCP Backchannel Proposals
RTCP Reporting Extensions
RTP Payload Format for AMR Requirements
RTP Payload Format for AMR Scenarios
MIME-Type Registration of AMR Speech Codec
RTP Payload Crypto Profile
RTP Payload Format for DV Video
An RTP Payload Format for Generic FEC with Uneven Level Protection
Status Report on MP2T Extension to RTP
Future Steps in MP3
MPEG-4 Audio/Visual RTP format
RTP Payload for MPEG-4 with Flexible Error Resiliency
RTP Multiplexing using Tunnels (TCRTP)
RTP Payload for Comfort Noise
ITU-T Recommendation G.722.1: The RTP Packetization
RTP Profile for RTCP-based Retransmission Request for Unicast Session
RTP Interoperability Testing
RTP Payload Type Format to Enable Selective Retransmissions
RTP Payload for AC-3 Audio Streams
An RTP Payload Format for Erasure-Resilient Transmission of Progressive Multimedia Streams
RTP Payload Format for Distributed Speech Recognition