Current Meeting Report

2.7.1 Audio/Video Transport (avt)

NOTE: This charter is a snapshot of the 52nd IETF Meeting in Salt Lake City, Utah USA. It may now be out-of-date. Last Modified: 05-Oct-01
Stephen Casner <>
Colin Perkins <>
Transport Area Director(s):
Scott Bradner <>
Allison Mankin <>
Transport Area Advisor:
Allison Mankin <>
Mailing Lists:
To Subscribe:
Description of Working Group:
The Audio/Video Transport Working Group was formed to specify a protocol for real-time transmission of audio and video over UDP and IP multicast. This is the Real-time Transport Protocol, RTP, together with its associated profile for audio/video conferences and payload format documents.

The current goals of the working group are to revise the main RTP specification and the RTP profile ready for advancement to draft standard stage (including the sampling algorithms for use with very large groups, which have been broken out into a separate document), to complete the RTP MIB, to produce a guidelines document for future developers of payload formats and to continue development of new payload formats.

The payload formats currently under discussion include a number of media specific formats (MPEG-4, DTMF, PureVoice) and FEC techniques applicable to multiple formats (parity FEC, Reed-Solomon coding).

Archive before July 2001:

Goals and Milestones:
Done   Working group last call on guidelines for payload format writers (BCP)
Done   Post revised DTMF payload format draft, ready for WG last call
Done   Post revised RTP spec and audio/video profile
Done   Working group last call on parity FEC draft (standards track)
Done   Post revised RTP MIB and issue working group last call (stds track)
Done   Post RTP implementation checklist draft
Done   Post revised draft on PureVoice (qcelp) payload format to address WG last call comments
Done   Post payload format for MPEG-4 based on MPEG/IETF joint meetings
Done   Post revised RTP membership (SSRC) sampling draft
Done   Submit RTP MIB to IESG for publication as Proposed Standard RFC
Done   Submit guidelines for payload format writers for publication as a BCP
Done   New working group last call on PureVoice payload format
Done   Analysis/simulation of multiplexing payload format proposals
Done   Working group last call on revised SSRC sampling draft (experimental)
Done   Post final revision of RTP spec and A/V profile drafts
Done   Revise MPEG-4 payload format document after implementation experience
Done   Working group last call on RTP and A/V profile (for Draft Standard)
Done   Decide how to proceed with multiplexing protocol: one generic payload format or a number of application specific formats
Done   Prepare MPEG4 implementation results ready for WG last call
Done   Post final revisions of selected multiplexing protocol draft(s)
Done   Working group last call on multiplexing payload format (stds track)
Request For Comments:
RFC1889PSRTP: A Transport Protocol for Real-Time Applications
RFC1890PSRTP Profile for Audio and Video Conferences with Minimal Control
RFC2032PSRTP payload format for H.261 video streams
RFC2029PSRTP Payload Format of Sun's CellB Video Encoding
RFC2190PSRTP Payload Format for H.263 Video Streams
RFC2198PSRTP Payload for Redundant Audio Data
RFC2250PSRTP Payload Format for MPEG1/MPEG2 Video
RFC2343E RTP Payload Format for Bundled MPEG
RFC2354 Options for Repair of Streaming Media
RFC2429PSRTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)
RFC2431PSRTP Payload Format for BT.656 Video Encoding
RFC2435PSRTP Payload Format for JPEG-compressed Video
RFC2508PSCompressing IP/UDP/RTP Headers for Low-Speed Serial Links
RFC2733PSAn RTP Payload Format for Generic Forward Error Correction
RFC2736 Guidelines for Writers of RTP Payload Format Specifications
RFC2762E Sampling of the Group Membership in RTP
RFC2793PSRTP Payload for Text Conversation
RFC2833PSRTP Payload for DTMF Digits, Telephony Tones and Telephony Signals
RFC2862PSRTP Payload Format for Real-Time Pointers
RFC2959PSReal-Time Transport Protocol Management Information Base
RFC3009PSRegistration of parityfec MIME types
RFC3016PSRTP payload format for MPEG-4 Audio/Visual streams
RFC3047PSRTP Payload Format for ITU-T Recommendation G.722.1
RFC3119PSA More Loss-Tolerant RTP Payload Format for MP3 Audio
RFC3158 RTP Testing Strategies

Current Meeting Report

Audio/Video Transport Working Group Minutes

Reported by Stephen Casner and Colin Perkins

The Audio/Video Transport working group met for two sessions at the 52nd IETF in Salt Lake City. As usual, we filled all the scheduledtime (5 hours) plus a little run-over time on the first session since, unlike the usual schedule, it was in the Monday night slot. The first session covered the RTP payload formats already under discussion in the WG plus proposals for a few new ones. The second session covered the status of the RTP protocol itself, plus several extensions including new profiles for security and RTCP-based feedback.

The meeting opened with an update by Steve Casner on the status of documents in process. The group has had a single RFC published since the last meeting: RTP Testing Strategies (RFC 3158). The RTP payload formats for DV audio & video are on the RFC Editor's queue. The RTP payload for AMR and AMR-WB audio has been waiting for AD approval since July, but shortly before the meeting we received notice of IESG objections that the document in not sufficiently clear in structure and language. Steve apologized to the authors and the other standards bodies waiting for this document because the delay in providing this response does not represent satisfactory performance by the working group and the IETF. However, this also serves as a lesson to the group that we must put in the effort to carefully review documents for clarity of writing and structure in addition to technical correctness before we submit them to the IESG. The authors of the AMR draft are revising it with the help of the AVT chairs so that it can be resubmitted to the IESG as soon as possible.

In addition to the documents already in process, there are several drafts potentially ready for WG last call for comments. The working group agreed that the following drafts should go to last call. All of these are to go to Proposed Standard (except as noted):

- The Comfort Noise payload format (draft-ietf-avt-rtp-cn-04.txt) has reverted to a fixed 8 kHz clock rate for static payload type 13 (the change to make it variable in the previous version conflicted with the definition of a static payload type). There were no objections to going to WG last call. The main question for last call is whether there will be any backwards compatibility problems with early implementations of CN that included only the amplitude and not spectral information?

- The Enhanced CRTP (draft-ietf-avt-crtp-enhance-03.txt) and Tunneled CRTP (draft-ietf-avt-tcrtp-05.txt) drafts comprise AVT's product for RTP multiplexing. The question raised at the last meeting about whether the IPv4 ID field would be maintained under the "twice" algorithm has been resolved by having the decompressor learn the number of repetitions the compressor will send, and knowing that the context including the ID will be intact if fewer packets than that have been lost. The robust mode of operation is now described much more clearly, and PPP negotiation has been removed to a separate document. The group hummed agreement for issuing WG last on these (tcrtp will be for BCP).

- There are two drafts for payload formats to provide unequal error protection to different parts of a data stream. The Uneven Level Protection draft (draft-ietf-avt-ulp-02.txt) is an extension of the FEC payload format in RFC 2733. It was accepted for WG last call last time, but needed the MIME registration that was added in this revision. The second draft on Unequal Erasure Protection (draft-ietf-avt-uxp-01.txt) provides a somewhat similar service, but for different environments or requirements. These two drafts use different techniques that don't make sense to merge, so we have decided to proceed to Proposed Standard with both of them and see whether sufficient implementation interest exists for either. In order to convince the IESG that we should standardize two methods, each draft needs to have an applicability statement to indicate when it should be used. The uxp draft already has this; the ulp draft needs to add it. From these statements we will compose the cover letter to the IESG for these two drafts.

One draft not on the agenda for this meeting was on transfer of R2 signaling over the Internet (draft-markov-r2oip-00.txt). Henning Schulzrinne has been in contact with the author and will fold it into a revision of RFC 2833.

RTP Payload Formats for MPEG-4

The first group of payload formats discussed at this meeting are for MPEG-4. Philippe Gentric presented an overview of the four payload format documents and the relationships among them.

RFC 3016 was instigated by the ITU for use in H.323 conferencing and telephony systems. It supports only video elementary streams and audio streams encapsulated in LATM for applications that do not need MPEG-4 Systems functions. The "generic" payload format specified in draft-ietf-avt-mpeg4-multiSL-03.txt can encapsulate any MPEG-4 elementary stream including the Synchronization Layer (SL). The third draft, draft-ietf-avt-mpeg4-simple-00.txt, is not a payload format of its own, but rather an application note which specifies a subset of the generic payload format only audio and video elementary streams with no SL. It is intended to simplify understanding, implementation and use for the applications promoted by the Internet Streaming Media Alliance. The fourth draft, draft-curet-avt-rtp-mpeg4-flexmux-02.txt, is separate from these; it specifies RTP transport for MPEG-4 carried in a multiplex tool designed by MPEG called FlexMux. See Philippe's slides for a nice picture showing the overlap among these payload formats. Colin Perkins requested that this picture be included in the generic draft.

Philippe mentioned as an aside that there is a fifth document describing framework for MPEG-4 over IP. A copy/paste of the IETF draft is at almost-standard status in MPEG committee. Colin Perkins said he thought the draft was dead because it has not been updated in IETF for a long time. Colin expressed concern about the document being standardized in MPEG because there are several problems with it, in particular with inconsistencies relative to other documents. This was to be discussed further offline.

The generic payload format has evolved through about 13 revisions with different names. In the last two revisions, which were done since the previous meeting, the spec is not changing technically, but there have been a number of clarifications prompted in part by questions from implementers. In particular, the description of the two interleaving modes was explained more clearly, and an explanation was added to say how to handle rollover of the MPEG CTS timestamp, which is carried in the RTP timestamp, when the CTS is shorter than 32 bits. In the most recent revision, some text was restructured to make it easier to read, and references to MPEG-4 Systems were removed where the dependence was not really required.

Colin agrees that this revision is better, although on his last read-through he found there were still some editorial changes needed, otherwise it will come back from the IESG like the AMR payload format did. However, the big question, which generated quite a bit of discussion, is that the payload format remains quite complex. In particular, Colin claimed that the inclusion of two interleaving modes introduces significant complexity for relatively little gain, and that the reason we have both simple and generic drafts is because of the inclusion of the second interleaving method in the generic draft.

Philippe explained that the TSBI mode is timestamp based, with a timestamp on every piece of data. It works well for audio and video streams and is the mode used by ISMA in the simple draft. However, this does not work with Systems streams, such as scene description streams for which it is not possible to construct a timestamp for every piece to be interleaved. That requires the IBI mode in which interleaving is based on the index of the packet. Colin countered that those streams need to be delivered reliably, so why is there a need to interleave them (since the purpose of interleaving is to disperse the effects of loss)? The answer is that in certain cases of fragmentation of audio and video access units into SL packets, the fragments don't all carry timestamps. With TSBI you can only interleave entire audio frames, not fragments of frames as is appropriate for AAC; you can't deliver the most advanced audio tools in MPEG-4.

Steve Casner commented that the separate AAC payload format draft proposed earlier defined interleaving to handle the specific requirements of that audio type. Perhaps this is the correct way to handle other advanced requirements, with specific solutions as they arise, rather than trying to make something generic that will handle all of them.

Dave Singer explained that people expect to implement translators from MPEG-4 on other channels, which may use any and all SL functions, to MPEG-4 over RTP. They believe they must be able to carry anything that may appear even if they don't have a convincing argument as to why some of those things should ever appear on the SL side. Philippe agreed that there are features in MPEG-4 that nobody can say for sure will really be needed, but they want to keep the freedom to use them. The generic draft does impose some restrictions that were agreed after difficult discussions in MPEG, and that going back to try to reduce functionality further would be a serious problem. An example feature which needs the ability to interleave fragments is error sensitivity categories; this has in development in MPEG for two years, and the developers expect to have some way to carry it on IP. On the other hand, Stephan Wenger believes this work does not address the kind of problems we will see with rtp transport, e.g. packet loss.

Steve observed that the friction arises from a basic difference in philosophy between IETF and ISO: in IETF, we say that if you don't know for sure that you need some feature, it should be left out. We try to start with the basics and learn learn more about the problem, then add on other pieces as the real need is demonstrated. There may be some things you can do with MPEG-4 that you really just should not do. A profile of MPEG-4 could restrict what is accepted for transport over RTP to avoid this problem. He asked how much has been implemented; the ISMA companies are implementing the simple subset, but what about all the more complicated parts we have been discussing?

Dave Singer responded that we have a chicken and egg problem. Because it is complicated, implementers are holding off until the drafts stop changing and become a stable RFC. Dave Oran asked if people are aware that this won't make it to Draft Standard unless all of the pieces are implemented? It might go to Proposed Standard, but all the pieces that are not implemented and interoperability tested will get ripped out anyway. We presume the goal is to get a Full Standard someday. Singer agreed; we can say to the people who have insisted on these features that they must show us the interoperating implementations or they get yanked. Philippe said that interoperability statements may be hard to collect because interoperation is not disclosed publicly, but Colin replied that the interop requirements can be done under NDA if necessary. Colin asked how MPEG will respond if some features in the Proposed Standard are removed when it goes to Draft Standard. Philippe believes this is likely to be a problem. Stephan surmised that the response would likely be similar to that of the ITU: if there are features removed along the path codepoints will be assigned for the different RFCs at the different levels where the newer one is less powerful than the old one.

Philippe asked for this draft to go to WG last call. Colin asked for more opinions from others as to whether they think this is now at a stable technical level or whether they think other technical changes should be made. He said the key point to take from the discussion is that we are pushing the limits of what we can publish as an RFC because of the complexity of it. We are concerned that the IESG is going to take one look and reject it. We have to make a decision: should we give up trying to simplify this and just push it now.

Stephan spoke in favor of trying to give this a shot at last call. He sees the complexity argument, and would like it to be simpler, but thinks there is not real way to make it simpler. For example, if the elementary stream section were split out from the SL packetization, then people would come back and start asking for the missing part again. The fact that people are implementing it indicates that there is interest in the industry. We can't make a judgment on whether it is technically sound without seeing several implementations of it, and this won't happen until it is published as an RFC.

Steve Casner asked for consensus from the room. There was a slight hum for proceeding to last call. Colin's editorial revisions will be taken as part of last call comments.

Philippe concluded his presentation with the status of the simple draft, draft-ietf-avt-mpeg4-simple-00.txt. A few textual clarifications were made and a new draft revision issued in September 2001 after the London IETF meeting. At that point, the technical spec was frozen; in fact it has been copy/pasted into the ISMA 1.0 technical specification as an appendix since it is not an RFC yet, with the intention of replacing that with a reference to the RFC later. For this document as well, he asks to go to WG last call.

Stephan was against rubber stamping a document that has been frozen by another group. His concern was triggered by a similar situation in the ITU where the text of a draft was incorporated into H.225.0, but the semantics of one field changed before the draft was published as RFC 2190. The result was that a second code points had to be assigned for the RFC 2190 payload format. He wants to avoid having this happen again. Philippe and Dave Singer both said they believe this situation is different. There has been a lot of interaction with the IETF in the development of the payload format. The ISMA would much prefer to reference the RFC. If there are changes that occur in the RFC process, those will be accommodated in version 1.1 of the ISMA spec.

Colin said the main issue for the last call is to make sure this draft is absolutely consistent with the generic draft, and asked for other people in the group to do some very nit-picky proofreading of both drafts. Philippe believes this is not likely to be a problem because the controversial issues in the generic draft are the ones which are excluded from the simple draft.

Those at the meeting hummed in agreement for issuing WG last call. Steve stated a reminder that the hums we take in the meeting are only preliminary agreement; the final decisions are made on the mailing list.

The third payload format is draft-curet-avt-rtp-mpeg4-flexmux-02.txt for MPEG-4 FlexMux streams, presented by Catherine Roux. This payload format is separate from the others and would be identified by a new MIME subtype name "mpeg4-flexmux". The motivation is that there may be a large number of elementary streams in a scene, so the FlexMux multiplexes all of those into one RTP stream. The payload format itself is simple: an integral number of FlexMux packets are packed into one RTP packet. There is no need for fragmentation because fragmentation is handled in the SL packets.

The complications arise from the fact that as FlexMux is used on other channels, streams that require reliable transmission and those that don't are all multiplexed together into one stream. In general, this won't provide satisfactory performance on IP networks. In fact, both the generic and FlexMux payload formats have the problem that they can encapsulate streams which must be communicated reliably, yet they don't provide reliable transport. This must be addressed by a combination of an "applicability statement" for the environment in which the payload format will work, and the restrictions on the content of the payload that are required to allow it to work. Three issues were identified for FlexMux:

1. It may be necessary to sending two types of FlexMux, one over RTP for audio and video data, and another using TCP or some other mechanism to achieve error-free transmission of System data such as IOD and BIFS. In that case, the draft must specify how the two will be synchronized and what to do when the TCP stream is delayed more than the RTP stream.

2. The configuration of the FlexMux itself may be static and signaled out-of-band, e.g., with SDP. (The slides propose a new SDP media attribute a=mpeg4-flexmuxinfo, but as a result of discussions before the meeting, the a=fmtp attribute will be used instead.) Alternatively, it may be dynamic with configuration changes transmitted in-band. In the latter case, the changes must be transmitted without errors or subsequent decoding of packets will produce garbage.

3. As currently specified, the RTP timestamp indicates the transmission time of the RTP packet. This can be used for jitter measurement and removal. However, because the clock is independent of the CTS and DTS timestamps of the SL level, the RTP timestamp can't be used to synchronize with other RTP streams.

For the second issue, Philippe added that it is possible to send configuration changes early and tell when they becomes valid. But Steve stated that some form of ack would be need to confirm that the new configuration was received before starting to send packets in that configuration. Philippe replied that the time it takes to do the handshake (the ack) is unbounded, but FlexMux is designed such that a change to the configuration might be made every 50ms and it has to happen right on time. The round-trip delay may be excessive. Steve pointed out that, by definition, reliable communication requires unbounded time. This is a fundamental problem with trying to take something that was originally intended for carriage over a circuit and trying to put it on IP and expect that it will work. You have make some accommodations.

Dave Singer stated that misdecoding packets due to a lost configuration packet is unacceptable. He suggested that one way to avoid this is to put a configuration seed (generation) in every packet. If you see the seed change and you have not received the corresponding configuration, then you don't decode it. Steve agreed that a generation number could be part of the encapsulation of FlexMux into RTP, but there are a number of issues like this that must be worked through before this can be a valid payload format. It is a combination of working out mechanisms to solve problems and expressing restrictions on the domain to which the payload format can be applied since there are some problems we can't fix.

Dave is also concerned about the third issue, timestamps. For example, if you want to use FlexMux to send tiny audio packets, but you want to send video with it for which FlexMux is inappropriate, you can't send it as a separate RTP stream and use RTP methods to synchronize. The other MPEG-4 payload formats can be synchronized with GSM or H263 or other encodings carried in RTP, and so should FlexMux.

Steve agreed that if the transmission timestamp is not really required to dejitter the packets, or if it's only a little bit helpful, then certainly picking something more useful as the timestamp source is a good idea. The difficulty is that when many streams that don't have a common time reference are multiplexed together, then how do you choose which timestamp is the reference? Philippe explained that one of the reasons for this design choice is that SL packets that are not the first part of an access unit do not carry the timestamp. A box acting as a translator receiving a FlexMux stream over a circuit and transmitting on RTP could not assign a timestamp to each SL packet. However, he disagreed with that assumption and suggested that even though the SL packets within the FlexMux may have different timestamps, the system should use a rule such as using the slowest one. Dave went further to suggest that all the timestamps inside the FlexMux packet should be recalculated relative to the RTP timestamp. That calculation could be undone at the other end if desired. Then if the RTP timestamp gets moved, everything moves with it, but things that were supposed to be synchronized can stay synchronized.

RTP payload format for GSM EFR speech

Peter Barany presented draft-barany-avt-efr-00.txt, a proposal for a new payload format for GSM EFR coded speech. It should be noted that Section 4.5.9 of draft-ietf-avt-profile-new-12.txt already defines a payload format for this codec, so the main question for this proposal is whether a different payload is justified.

The motivation for this payload format is to support a large number of legacy GSM transceivers already in the field as part of a transition to GERAN packet switched architecture. These devices implement the EFR speech codec with its particular form of interleaving and other characteristics. The existing EFR payload format does not provide a means to convey an indication of damaged frames that may be provided by an radio interface at the point where it is gatewayed to the packet network. The purpose of the damaged frame indicator is to allow the decoder at the RTP receiver to make partial use of the bits in the damaged frame as opposed to treating the frame as a complete loss. The draft proposes a payload format similar to the existing one but replaces the 4-bit signature at the head of each frame with a set of lags. In the -00 draft, just one flag bit (Q) is defined to indicate good vs. damaged frames, but in the presentation it was proposed to copy 4 bits from an existing circuit interface between base stations and mobile switching centers. Those bits also indicate SID frames.

Steve Casner asked why not use the EFR mode within the AMR codec, since the AMR payload format supports the Q bit. The reason is that the existing GSM devices do implement EFR and its framing, but not AMR framing.

Colin Perkins suggested that, with an appropriate choice for the sense of the status bits, in the case of a good packet the bit pattern would be exactly the same as the 4-bit signature in the existing payload format. If one or more status bits changed to indicate a damaged frame, then implementations of the existing payload format should reject the frame as malformed. This would allow keeping a single payload format and extending it in a backward-compatible manner.

A larger discussion centered on the question of whether adding the status bits would provide a sufficient improvement in performance to justify having a second payload format. In particular, does a receiver process any of the bits in a damaged frame, or simply discard them and do frame substitution. If the latter, then the RTP sequence number already provides the means to indicate missing packets. The claim was made that the MOS (intelligibility) scores were dramatically improved by using the partial information of damaged frames, although that data was not available in the presentation.

It was unclear in the discussion whether the motivation for marking bad frames within the frame signature rather than just treating damaged frames as lost (using the RTP sequence number) stems from a desire to pass the bad frame indication over the "optimized bearer channel" where the IP/UDP/RTP headers are stripped ("zero-byte header compression"). If so, then the format of frames sent over that channel is not really a concern for AVT. RTP will be used between the mobile system (RNC) and the PSTN, but the following thought arose during the preparation of these minutes: why not use AMR payload format on the portion of the network where RTP is really carried? AMR already includes the Q bit and can carry GSM-EFR. The comfort noise (SID) frames are different, but the translation could be handled as part of the gateway function to the legacy portions of the system. That way the upstream portions of the system would already have transitioned to the newer mode of operation.

Peter asked if this payload format can become a working group item. Colin asked for a hum, the result of which sounded like nearly nothing in both directions. Steve observed that the proposal seems to be still in flux, and that it should stabilize before we consider it as a working group item.

Colin suggested that another individual draft should be produced which includes the data showing the improvement in performance (MOS scores) as a result of including the damaged frame indication, and then at the next meeting we'll take up whether we should consider it as a working group item. He also asked that the idea be considered to choose the bit pattern to match the existing signature in the normal case.

RTP Payload Format for EVRC, SMV and Frame-Based Vocoders

Adam Li presented draft-li-avt-vocoder-00.txt which is actually not new because it is the combination of draft-ietf-avt-evrc-08.txt, which has gone through several revisions and was accepted for WG last call, with draft-mathai-avt-smv-00.txt and draft-espelien-avt-common-00.txt. The EVRC and SMV payload format drafts were combined because they were essentially the same, so the combined draft just needs to define both sets of payload format constants and both MIME subtype registrations. Additional information was added to specify how the payload format could be extended to other vocoders.

There was only one technical change. For type 1 (octet-aligned) packets, a new one-octet header is added to indicate the count of frames present and to carry the flow-control (rate control) bits which used to be replicated in the Table of Contents entries for each frame. That allowed the ToC entries to be reduced from 8 bits to 4 bits each, with 4 bits of padding at the end only if the frame count is odd. For type 2 packets (no payload header, single frame only), there is no change.

Adam requests that this draft be accepted as a WG item and that it proceed quickly to WG last call since the EVRC draft had already been approved.

Colin Perkins expressed concern that this draft tries to be too general. Combining EVRC and SMV makes sense, but trying to extend it to future, as-yet-unspecified codecs seems to be stretching it a little. It may be more appropriate to limit the payload format to just those two codecs. If we have another codec in the future, we can slot it into this same framework if it is appropriate, but we don't want to encourage people to shoe-horn codecs in. You're going to have to produce a new RFC for a new codec in any case (to define the MIME type and other parameters).

Adam replied that this payload format is not for every codec, but that it tries to specify a set of criteria for other codecs that might want to use is. People can choose to use this for future codecs that meet the criteria, or may design something new. But Colin noted that the same is true of every payload format we have, so it is not necessary to explicitly state that in this draft. This draft is not exceptional it its reusability; it just happens to fit a class of codecs. Pete McCann observed that the reason EVRC and SMV fit together is that they were both designed for CDMA-2000 air interface. A new codec designed for that interface would probably also fit. Maybe it is inappropriate to be more broad than that.

One question was whether the designer of a new codecs would have to pay for expensive MOS tests to justify creating a different payload format, noting that such data was requested in the discussion of EFR. Steve replied that one can always define a new payload format for a new codec. The pushback for EFR is because we already have a payload format for that codec and we prefer not to have more than one for interoperability reasons.

It was agreed to make a revision of this as a working group document.

RTP Payload Format for AC-3 Streams

Jason Flaks presented draft-flaks-avt-rtp-ac3-00.txt which specifies a new payload format for the Dolby Digital AC-3 coded used for DVD recordings and ATSC digital terrestrial TV with up to 5.1 channels of audio. AC-3 frames contain sync and CRC fields plus 1 to 6 audio blocks, one for each channel of audio. The frames are fairly large, which means they may be fragmented. As a result, the RTP payload format uses fields of the RTP header with standard meanings but in a manner more like a video codec than an audio codec. The timestamp corresponds to the sampling instant of the first frame contained in the packet, and multiple packets may have the same timestamp when a frame is fragmented. The marker bit it set on packets that contain the last fragment of a frame (or one or more whole frames). There is an 8-bit payload-format-specific header (misnamed a header extension in the slides) that specifies the number of frames contained in the packet, a fragment sequence number (the slides erroneously say frame sequence number), and a redundant data flag.

The packet may contain multiple frames, the first of which may be a frame fragment, but since the frames are usually large (ranging from 80 to 3840 bytes) there will usually be only one frame per packet. At the large end, one frame exceeds the MTU of most networks, so frames may be fragmented. The fragment sequence number increments with each fragment of a frame, starting from 0. AC-3 has a rule that the first two audio blocks are guaranteed to be in the first 5/8ths of a frame, which is covered by the first CRC, so decoding can start at that point. That makes a logical fragmentation point for RTP as well when the size is appropriate for the MTU, although with RTP there is no guarantee that the remaining 3/8ths will arrive.

On film, there is a redundant 2-channel audio track that is used if the 5.1 channel digital audio track wears out. This payload format proposes an analogous redundancy mechanism where redundant lower-rate 2-channel data for frame N+1 is carried in the packet along with the primary data for frame N. Presence of the redundant data is indicated by a bit in the payload header.

Steve Casner asked if the fragment sequence number is needed to know how to place the fragments. The RTP sequence number should be sufficient for putting the fragments in the proper order. If you also have a rule that only the first frame can be a fragment, and later frames in a packet must be integral, then there is no confusion about how many fragments there are. (This payload format does include that rule, so the fragment sequence number may not be needed.) Two comments arise in the preparation of these minutes: it may be more prudent to use a 32-bit payload format header since these frames are large so the overhead is not significant. The current field definitions are quite small. That would also allow replacing the fragment sequence number with a fragment offset so that data may be placed properly in a contiguous buffer even if the fragments arrive out-of-order.

Philippe Gentric asked if demultiplexing of the channels was considered. Would there be an advantage in transmitting the two main channels in one RTP stream and the remaining channels in another? Jason replied that the audio encoding makes use of similarities among all 6 channels, so they need to be processed together.

Steve remarked that this is a perfectly reasonable payload format to undertake as a work item; the only consideration is the number of work items that we undertake. There was a fairly strong hum to accept this one.

Resilient MIDI RTP packetization

John Lazzaro gave a presentation on a resilient MIDI packetization designed for network musical performance as described in draft-lazzaro-avt-mwpp-midi-nmp-00.txt. Even though this is the first time this work was brought to AVT, experimentation using this payload format with RTP has been in progress for some time, so the draft is already quite substantial.

Latency is an issue for network performance, but for distances like the 350 miles between northern and southern California, the measured delay is the same as a 16-foot separation on stage. Network performance is feasible in such cases so long as additional buffering is not added for late packets. The answer is to send the MIDI encoding of the musicians' gestures and turn them into audio at the receiver without delay. Semantic recovery from late and lost packets produces a result that just sounds like imperfect performance rather than funny noises. Most of the draft is a specification of what the rules need to be to make the semantic recovery sound good. The rules were derived from two years of experimentation.

MIDI has a very low data rate (fast piano playing is 300 bps), but is very fragile. A missed "NoteOff" command causes a hung note forever. The proposed packetization uses a recovery journal to recover from lost packets without requiring explicit retransmission. The recovery journal is sent in every packet and contains a minimal session history since the last checkpoint packet. The checkpoint is advanced by the receipt of the RTCP "last packet received" statistic. This even works with small-scale multicast, taking the checkpoint from the least advanced receiver. The increase in data rate is not that significant: 4 kbps for fast piano work. Packets that arrive on time are always executed, but a packet that arrives late is processed according to the semantic rules that are specific to each command (e.g., the command may be superseded by a later packet that arrived earlier).

The open issues for the payload format have to do with how much of the algorithm is normative. Currently, the semantic recovery rules are normative in the sense of a best current practice, but they may not be optimal. The algorithm for deciding on-time vs. late is not normative because different criteria may apply in different scenarios.

Steve Casner responded that it is fine to have those semantic rules in the draft, it should just be stated clearly that this is a recommended practice, a SHOULD (or RECOMMENDED) not a MUST. This is the first instance of a semantic requirement being on RTCP receiver reports, where the payload format won't work without RTCP. That's new territory, but should not be a problem. John has some concern about some of the robust header compression schemes that don't preserve sequence numbers exactly. All the sequence numbers need to be the right ones, not just close. Steve replied that the zero-byte header compression doesn't apply here because this medium doesn't have the regular timing that scheme depends upon, so it would not be used.

Another open issue is whether the payload format's performance could be improved with the flexibility under new profiles to define new RTCP-based feedback rather than assuming only the standard packet types. But it seems simpler to stick with the standard ones since that works. Colin agrees. Stick with this for at least the first version, and leave open the possibility of using features like RTCP-based retransmission later if appropriate.

The recovery and time semantics of this payload format are also compatible with MPEG-4 Structured Audio (MP4-SA). The draft normatively includes MP4-SA execution semantics for software synthesizers. However, this payload format is compatible with all software synthesizers, including those that do not use MP4-SA.

This proposal needs careful review both by RTP-savvy people and by MIDI-savvy people. IETF has an opportunity here to lead all the musicians who are just starting to look at networking down the right path rather than having to patch things up later.

Colin said this proposal is "cool" and clearly is a candidate for standardization. Steve agreed, as did many in the room.

RTP Payload Format for Distributed Speech Recognition

Although draft-ietf-avt-dsr-00.txt is a -00 working group draft, but it has been discussed in AVT as individual submission for quite some time, first by Jeff Meunier in July, 2000, and more recently by Qiaobing Xie. Qiaobing's presentation was very short because the only changes in this revision were to remove the frame pair counter since the count can be inferred from the packet length, and to remove the end of speech flag since that is now indicated by a special NULL frame pair. This means that now there is no payload header required at all. This draft also clarifies how discontinuous transmission can be supported. There are no open issues remaining.

Colin Perkins identified one simple bug in the document: the MIME registration specifies a parameter "sample-rate" which should be the common parameter "rate" instead.

Steve Casner remarked that it is pretty hard to argue with a payload format that just says put the data into RTP packets. So this proposal is fine, and we should proceed to last call. The group hummed yes.

RTP payload format for JPEG 2000 video streams

Eric Edwards presented a new RTP payload format for JPEG-2000 video streams (draft-edwards-avt-rtp-jpeg-2000-00.txt). JPEG-2000 is a new video format, which provides superior low bit-rate performance, strong error resilience and scalable operation. The key features of the proposed RTP payload format are robustness to packet loss through intelligent fragmentation, a priority information field for scalable delivery from the same bit-stream, persistency of the main header to minimize the effects of loss. The JPEG committee has expressed considerable interest in participating in this work, and will provide feedback to AVT.

After giving an overview of the draft, he raised the open issues: are there any suggestions for a better fragmentation scheme? How to use the optional fields? How to handle or define use of priority mapping tables? Steve Casner noted that there are many different packet types, and wondered if all the different types are useful? If the receiver makes use of the type and priority fields to do something different, and that information is not already available in the data stream itself, then it makes sense to have those header fields. But, if the fields are redundant with information in the data stream, then putting it into the RTP header might not be a good idea, because it allows the possibility of inconsistency. It may be appropriate to use only a reduced subset of these types? Steve also noted that the use of RFC 2119 terms is needed.

Marshall Eubanks questioned the licensing terms for this format. It was clarified that IPR has not been claimed on this payload format, and that JPEG-2000 part 1 is "targeted to be royalty free" (this payload format transports a series of JPEG-2000 part 1 frames). There may be IPR on other parts of the JPEG-2000 standard.

The JPEG-2000 payload format was accepted as a work item of AVT.

Update to RTP specification and profile

Steve Casner discussed the changes that have occurred in the RTP specification since the last meeting: relax the requirement to use an even/odd port pair if the ports are explicitly signaled; explain that only one compound RTCP should be sent per interval, and that you should round-robin through sources if there are too many to send reports for all of them within the MTU; and explain the effect of varying packet duration on the jitter calculation. There were also a number of minor fixes and clarifications. To facilitate IESG acceptance, the open issues section has been removed, the table-of-contents moved to the beginning, the references have been split into normative and non-normative parts, and the changes since RFC1889 section has been updated.

Henning Schulzrinne asked about interoperability requirements for the reconsideration algorithm, and if we have satisfied the IESG. Steve explained the tests we have done, and said that we seem to meet all the constraints. Colin Perkins noted that he has discussed this with the area director, who seems happy with the tests conducted.

The Audio/Video profile had one clarification, that the RTP timestamp usually equals the sampling clock, but not always. There are also a number of changes to facilitate IESG acceptance, much as was done for the RTP specification. The RTP MIME draft now begins with a summary list of the subtypes being registered, and has the common fields replicated into each registration to make them individually stand-alone, plus other minor clarifications. The RTCP-BW draft has had the note about conflict with RFC 2327 changed to note that this is resolved in the update to SDP. This implies a dependency on the draft draft-ietf-mmusic-sdp-new, which may hold up publication of the RTCP-BW draft.

The plan is to send the publication request for RTP, and the related drafts, to the IESG before the meeting is out. Does anyone object? No. Stephan Wenger asked if the wording on congestion control was sufficient? Steve noted that we've had feedback from the area director that it is okay.

The official request for advancement of the RTP spec and A/V profile to Draft Standard status was sent to the Area Director on Friday of IETF week, December 14. This is a very significant milestone.

Secure RTP profile

Elisabetta Carrara discussed the recent changes in SRTP (draft-ietf-avt-srtp-02.txt). Since the -01 version, the optional SPI field and TESLA have been removed, and the draft has been rewritten as a framework. Other changes include clarification of the crypto-context, and the SRTCP coverage, as requested in London. A new pseudo-random key derivation function has been added, to derive the session keys from a master key. Colin Perkins asked if this key derivation scheme was mandatory, or if the session keys can be supplied manually? It is mandatory. The encryption transforms supported have been updated, with support for AES-CM IHA and shared master keys being removed. The key-stream definition has been updated, and clarified. The UMAC authentication scheme has been removed, due to lack of a stable specification, and TMMH/16 authentication has been added as an option.

The open issue is "sharing of the same master key among several RTP streams", which was a design goal in earlier versions of the draft, but is not possible with the new definition of AES-CM. The solution, to keep this function, is to insert a unique-per-stream value into the initialization vector of the streams that wish to share the same master key: can the SSRC be such a unique value? Colin Perkins agreed that this is a reasonable constraint.

Steve Casner praised the rewrite as being a great clarification, but noted the chairs concern that the draft changes significantly with each version. Elisabetta assured the group that the draft was now stable except for editing in the open issue solution mentioned above. It was noted that HMAC is a 20-byte authentication mechanism, but this draft uses a 4-byte version? Yes, it was cut-down for speed and real-time operation. It was noted that figure 1 in the draft is misleading, due to a padding problem, and needs to be clarified. It was asked if unequal error protection is supported? Yes.

An implementation is available at

Steve Casner noted that there are a number of clarifications to make, and the open issue needs to be resolved. Once that is done, the draft is expected to be ready for working group last call.

RTP extensions for classification and priority

Ken Carlberg discussed RTP extensions for classification (draft-carlberg-RTP-classifier-extension-00.txt). This is a means of marking packets, for example to indicate emergency calls, defined as a header extension. There is also a draft by James Polk (draft-polk-avt-rtpext-res-pri-00.txt) that is very similar.

Dave Oran noted that this is only needed if you need to mark different packets in a stream differently, since otherwise you'd just signal the classification at call setup time. Steve Casner and Henning Schulzrinne also expressed confusion about why this was useful, as opposed to signaling it during call setup or using something like differentiated services to mark the packets. Much discussion followed, on the proper place to signal this information, with the eventual consensus being that this was not appropriate to signal in RTP packets. Further action was deferred until, and unless, a more complete use case is defined.

Extended RTP Profile for RTCP-based Feedback

Joerg Ott updated the group on the profile for RTCP-based feedback (draft-ietf-avt-rtcp-feedback-01.txt) and the associated simulation results (draft-burmeister-avt-rtcp-feedback-sim-00.txt). Changes since the last meeting include investigations into timer distribution and dithering schemes, updates to the security considerations and introduction, and extensive simulation, results of which are reported in draft-burmeister-avt-rtcp-feedback-sim-00.txt. There are also two interoperable implementations, which seem to validate the simulation results. There is one remaining open issue: suppression of unknown feedback. What should a receiver do if it receives a feedback packet it does not understand? Should it suppress its own feedback? The authors propose that receivers should not suppress if they receive unknown packets. Steve Casner agreed, saying that he was now convinced that there is not an implosion problem. However, he noted that the draft needs to explain why there is not an implosion problem so that others (especially the IESG) will not be left wondering. The result, Joerg noted, is that the Generic INFO packet needs to be removed; timer use also should be clarified. Once this is done, it is expected that the draft will be ready for last call. Steve asked if folks were comfortable with this extension going to WG last call? No objections were raised.

RTP retransmission

David Leon discussed changes to the RTP retransmission framework (draft-leon-rtp-retransmission-01.txt). The payload format has been changed to better use the RTP header; the rules explaining how to associate the original and retransmitted streams have been clarified; and the reason why retransmissions are sent in a different stream to the original has been clarified (to allow the receiver to differentiate between losses in the original stream and losses of retransmitted packets, and to avoid corrupting the RTCP jitter estimate). An implementation exists, and performance results were briefly presented and discussed. Steve Casner noted that the congestion control section needs expansion and clarification.

David Leon asked for feedback on the trade-off involved in having two RTP session, versus sending the feedback on the same session as the original media. Steve Casner noted that this is the main difference between this work and draft-ietf-avt-rtp-selret-03.txt. Stephan Wenger commented that a recent ITU-T meeting had a proposal for using the SSRC to identify sessions, rather than the UDP port number, since some implementations have problems when many ports are used. He doesn't like that proposal, but we might want to take this type of implementation considerations into account. It was also noted that there is no way of specifying that some subset of packets in a session are not cacheable, and that sending packets in a single stream may confuse naive implementations of caches. Steve Casner noted that he likes the idea of keeping the retransmissions separate, since it makes the protocol cleaner, leaving multiplexing to the lower layers.

Henning Schulzrinne raised the issue of piggybacking retransmitted packets on the original packets, for example using the parity FEC format to send retransmitted data (to reduce the overheads). Jonathan Rosenberg noted that the RFC 2733 format could be used unchanged as a retransmission format, as an alternative to this protocol. Steve Casner noted that RFC 2198 might also be suitable, although it has a different set of constraints. Colin Perkins noted that RFC 2198 also doesn't completely preserve the RTP header fields.

Dave Singer raised the issue of congestion control, citing the positive feedback loop inherent in any non-congestion controlled retransmission. This is clearly something that is of importance, and should be addressed in the draft.

Further discussion is clearly needed, on the mailing list.

RTCP Extension for SSM Sessions with Unicast feedback

Julian Chesterfield presented an RTCP extension for single source multicast (draft-chesterfield-avt-rtcpssm-02.txt), using unicast for RTCP feedback. Changes since the last meeting: restructure with more focus on security; modularize the RSI packet format with addition of a new report block (the loss jitter summary block); SDP session level attributes addressed; the source may no longer change reporting formats during a session; the complexity of the mixer/translator rules has been removed; more details on identifying SSRC collisions have been added.

Feedback was solicited on the loss jitter summary blocks. Colin Perkins wondered if these RTCP extensions, or something like them, could be extracted from this draft to make them more generally useful? Steve Casner noted that this has, so far, been a paper design process. We may not know if this is useful until we have real implementations. Julian noted that he has an implementation in progress.

Steve Casner noted that the expanded discussion of security and authentication is good, but wondered if it is sufficient (for example, because this puts requirements for authentication on the signaling protocol used with it, which may not be known). Ross Finlayson wondered if there is some way in which receivers can know to stop sending RTCP, if they do not hear from the sender for some time period, to limit the duration of any packet bombing attack?

Extending RTCP for call quality metrics

The final presentation was by Alan Clark, introducing new work that is not yet described in an internet-draft. The aim is to include extra information in RTCP, to report on perceived voice quality, after all error concealment and correction has been performed. Dave Oran noted that all this information is available today using SNMP; the motivation is presumably therefore so the sender can use this information to adapt its transmission? What is the advantage of including this information in RTCP, rather than using SNMP? Alan disagreed somewhat, saying that the information was useful for network performance monitoring. Dave Oran noted that RTCP does not necessarily follow the same path as the media, so you might not be able to see the feedback anyway? Henning Schulzrinne noted that the routing often becomes symmetric at the edges, so this may not be an issue. Also, he noted that inter-domain SNMP is often not possible, so this may be useful for that purpose. Henning wondered how the impairment could be expressed, MOS or PESQ scores? Yes, perhaps. Steve Casner wondered if the sender or receiver could use this information to do things differently. If not, why include this? For example, the distribution of loss will clearly have an impact on the quality, but knowing it does not necessarily allow the receiver to do something differently. The use case needs to be well defined. Colin Perkins expressed a similar concern: he can see how more detail on the loss or jitter patterns can be used to adapt, but how does a perceived quality score help? There was much related discussion. Henning Schulzrinne noted that SLA monitoring might be one application, as might be route selection between the public network, and a private leased line. Colin Perkins noted that there has been earlier work on providing more detailed RTCP feedback in AVT, which might be relevant. Steve Casner asked if there are any IPR considerations on this work? There are not. The next step in this work is to prepare a draft on these ideas, for review at future meetings.

The meeting concluded with a brief demonstration of the retransmission work, by David Leon.


Relations between the 4 MPEG-4 IETF documents
RTP Classifier Extension
RTCP-based Feedback: Concepts & Message Timing Rules
RTP retransmission framework
RTCP Extension for Single Source Multicast
RTCP Extensions for VoIP
RTP payload for MPEG4 FlexMultiplexed streams
RTP Payload Format for EFR Speech Codec
An RTP Payload Format for EVRC, SMV, and Other Frame-Based Vocoders
RTP Payload format for AC-3 Streams
The MIDI Wire Protocol Packetization
RTP Payload Format for Distributed Speech Recognition (DSR)
RTP Payload Format for JPEG 2000 video streams
Changes in RTP Spec
RTP Classifier Extension
RTCP-based Feedback: Concepts & Message Timing Rules
RTP retransmission framework
RTCP Extension for Single Source Multicast
RTCP Extensions for VoIP