2.8.1 Audio/Video Transport (avt)

NOTE: This charter is a snapshot of the 51st IETF Meeting in London, England. It may now be out-of-date. Last Modified: 31-Jul-01


Stephen Casner <casner@acm.org>
Colin Perkins <csp@isi.edu>

Transport Area Director(s):

Scott Bradner <sob@harvard.edu>
Allison Mankin <mankin@isi.edu>

Transport Area Advisor:

Allison Mankin <mankin@isi.edu>

Mailing Lists:

General Discussion:avt@ietf.org
To Subscribe: avt-request@ietf.org
Archive: http://www.ietf.org/mail-archive/working-groups/avt/

Description of Working Group:

The Audio/Video Transport Working Group was formed to specify a protocol for real-time transmission of audio and video over UDP and IP multicast. This is the Real-time Transport Protocol, RTP, together with its associated profile for audio/video conferences and payload format documents.

The current goals of the working group are to revise the main RTP specification and the RTP profile ready for advancement to draft standard stage (including the sampling algorithms for use with very large groups, which have been broken out into a separate document), to complete the RTP MIB, to produce a guidelines document for future developers of payload formats and to continue development of new payload formats.

The payload formats currently under discussion include a number of media specific formats (MPEG-4, DTMF, PureVoice) and FEC techniques applicable to multiple formats (parity FEC, Reed-Solomon coding).

Archive before July 2001: ftp://ftp.es.net/pub/mail-archive/rem-conf/

Goals and Milestones:



Working group last call on guidelines for payload format writers (BCP)



Post revised DTMF payload format draft, ready for WG last call



Post revised RTP spec and audio/video profile



Working group last call on parity FEC draft (standards track)



Post revised RTP MIB and issue working group last call (stds track)



Post RTP implementation checklist draft



Post revised draft on PureVoice (qcelp) payload format to address WG last call comments



Post payload format for MPEG-4 based on MPEG/IETF joint meetings



Post revised RTP membership (SSRC) sampling draft



Submit RTP MIB to IESG for publication as Proposed Standard RFC



Submit guidelines for payload format writers for publication as a BCP



New working group last call on PureVoice payload format



Analysis/simulation of multiplexing payload format proposals



Working group last call on revised SSRC sampling draft (experimental)



Post final revision of RTP spec and A/V profile drafts



Revise MPEG-4 payload format document after implementation experience



Working group last call on RTP and A/V profile (for Draft Standard)



Decide how to proceed with multiplexing protocol: one generic payload format or a number of application specific formats



Prepare MPEG4 implementation results ready for WG last call



Post final revisions of selected multiplexing protocol draft(s)



Working group last call on multiplexing payload format (stds track)

Request For Comments:






RTP: A Transport Protocol for Real-Time Applications



RTP Profile for Audio and Video Conferences with Minimal Control



RTP payload format for H.261 video streams



RTP Payload Format of Sun's CellB Video Encoding



RTP Payload Format for H.263 Video Streams



RTP Payload for Redundant Audio Data



RTP Payload Format for MPEG1/MPEG2 Video



RTP Payload Format for Bundled MPEG



Options for Repair of Streaming Media



RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)



RTP Payload Format for BT.656 Video Encoding



RTP Payload Format for JPEG-compressed Video



Compressing IP/UDP/RTP Headers for Low-Speed Serial Links



An RTP Payload Format for Generic Forward Error Correction



Guidelines for Writers of RTP Payload Format Specifications



Sampling of the Group Membership in RTP



RTP Payload for Text Conversation



RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals



RTP Payload Format for Real-Time Pointers



Real-Time Transport Protocol Management Information Base



Registration of parityfec MIME types



RTP payload format for MPEG-4 Audio/Visual streams



RTP Payload Format for ITU-T Recommendation G.722.1



A More Loss-Tolerant RTP Payload Format for MP3 Audio

Current Meeting Report

Audio/Video Transport Working Group Minutes
Reported by M. Reha Civanlar, edited by Stephen Casner

The Audio/Video Transport working group met twice at the 51st IETF in London. As usual, we had a full agenda. In the first session, we discussed document status, advancement of the RTP specification and the audio/video profile to draft standard, a proposal for RTP/RTCP modifications for NAT and firewall traversal, datagram control protocol proposal, RTCP extensions for SSM, and SRTP and related key management updates. In the second session, we discussed proposed modifications to RFC 2250, RTP profile for RTCP based feedback, RTP retransmission framework, and RTP payload formats for phonemes/facial animation, timelined static media, distributed speech recognition, EVRC speech, MPEG-4, and progressive media.

The meeting opened with a document status update by Steve Casner. The group has had a single RFC published since the last meeting: Loss-tolerant format for MP3 audio (RFC 3119). The RTP Testing strategies draft was with the RFC Editor and has since been published as RFC 3158; the RTP payload for AMR and AMR-WB audio is waiting for AD approval. The RTP payload formats for DV audio & video, which had already been reviewed by the IESG, were edited were edited to fix some minor remaining problems. Since the meeting, these drafts have been approved by the IESG.

The comfort noise payload format draft is considered for a last call. Steve Casner mentioned a problem with a change in the recent draft which specifies using the RTP clock rate of the most recent packet instead of the constant clock rate of 8000 Hz as specified for the static payload type 13. This would change the meaning of a static payload type which is a binding of a particular payload format and a set of parameters, including the RTP clock rate. The profile would have to be modified to allow this re-definition. An alternative is to use a dynamic payload type when a different sampling rate is needed for the CN payload format. Comments were solicited; there were none at the meeting, but the question will be posed on the mailing list.

The next two drafts considered for last call are related to compressed RTP: an enhanced version of CRTP (ECRTP) to support paths with long delay and higher loss rates, and tunneled CRTP (TCRTP) which uses ECRTP to multiplex several RTP streams with the same endpoints for reduced overhead. The only change in the TCRTP draft is that now it allows use of compression methods other than CRTP as long as they can tolerate delay, loss and misordering. The ECRTP draft has been significantly simplified by including only those extensions required to support TCRTP, and the introduction now more clearly explains that motivation. This draft now stands by itself and references RFC 2508. There is one issue requiring discussion by the WG: ECRTP no longer specifies that the IPv4 ID field is added to the UDP checksum by the compressor and subtracted by the decompressor, and similarly that the ID field is not included in the header checksum that is used instead of the UDP checksum when the latter is zero (disabled). The consequence is that the validity of the ID field cannot be assured when the "twice" algorithm is used, but there seems to be less concern about this now than before (since ROHC WG is considering zero-byte compression). Lars-Erik Jonsson agreed that it is best not to disturb the end-to-end semantics of the UDP checksum, but asked if this means that the "twice" algorithm is no longer allowed. Steve Casner responded: no, but the validity of the IP ID may not be maintained. Is this an acceptable position? Lars-Erik said no because ROHC does maintain the ID with a header checksum and zero-byte compression can't be used when the ID would not be maintained. He suggested other possible mechanisms be considered for ECRTP, but Steve countered that it would be better not to add any bits to CRTP. This issue needs to be resolved on the mailing list.

Three new drafts were mentioned just briefly: One calling for RTSP to be carried over HTTP is transferred to MMUSIC WG. A second defining MIME subtypes for JPEG 2000 is straightforward, but there may be an overlap with MPEG4 MIME subtypes to be discussed along with the MPEG4 payload type later. Lastly, the registration of a MIME subtype for Ogg bitstream format is independent of RTP, although it should note that there will be a separate MIME registration for an RTP payload format to carry the same media types (such as Vorbis).

Advancement of RTP to Draft Standard

The interoperability matrix for the RTP specification is now 100% complete thanks to Colin Perkins and Magnus Westerlund for testing the open items left from the previous meeting. The latest RTP spec draft-ietf-avt-rtp-new-10 incorporates a few clarifications from the last meeting, including how to calculate the avg_rtp_size and senders variables, and that when a compound RTCP packet is split into two parts for partial encryption, the CNAME goes in only one part.

A few additional questions have arisen since the latest draft. The first was whether CNAME is required in each compound RTCP packet if one MTU-size packet cannot hold reports on all sources. The answer is that only one compound packet must be sent per reporting interval and if all of the sources won't fit then a subset should be selected round-robin across multiple intervals. The second question was asked by Amos Guler on the list regarding jitter computation for variable-duration packets. Amos suggested redefining the packet "sending time" to be RTP timestamp plus packet duration. Steve agreed this would solve the problem, but he is hesitant to change such a fundamental part of the standard at this point. He added that the jitter calculation results are not to be taken quantitatively, but are to be used for relative comparison of different sources or a single source over time. He noted that the variation in the packet sizes might be considered as part of the jitter also since the receiver buffer must accommodate it. The jitter computation problem also arises for video, as was discussed later regarding the RFC 2250 revision.

The interoperability matrix for the RTP audio/video profile draft is also complete except for a few payload formats. Several payload formats that had been removed from the previous revision of the profile draft were added back in draft-ietf-avt-profile-new-11 because interop reports were received. The remaining deletions are GSM-HR, 1016 LPC, MPEG1 System and MPEG2 Program streams. Interoperability reports for these payload formats are still being solicited in case we have time to add them back during the IESG review. The revised draft also clarifies that the marker bit for audio is set only when packets are intentionally not sent during silence. The profile now includes references to draft-ietf-avt-rtp-mime in the description of each payload format for which optional parameters are defined in the rtp-mime draft. Colin Perkins noted that there were a lot of discussion on the list regarding these optional parameters for G.729 and urged everyone to check these drafts to make sure all the questions are answered.

Steve reminded the group that the purpose of the rtp-mime draft is to register MIME subtypes for all payload formats included in the A/V profile. In this -05 revision, a maxptime parameter was added. This global, optional parameter is to be used with any payload format to specify a maximum duration allowed for the media in the packet. A new parameter was defined because the existing ptime parameter is only a "recommended" maximum. Also, a channel-order parameter was added for L16 as it is used for DV audio. Steve asked MIME syntax experts to check the extended syntax for the RED audio payload format parameters to allow use of dynamic payload types. Another revision will be issued per Allison Mankin's request to add a section with a summary list the of MIME subtypes being registered.

A second auxiliary draft to the RTP spec and profile is draft-ietf-avt-rtcp-bw-04 which defines SDP bandwidth attributes for RTCP. In response to some questions on the list, it was clarified that bandwidth may be set higher than 5% as well as lower, and that the bandwidth limit applies to all RTCP packet types. The note in the draft about the bandwidth units difference was updated consistent with the revised SDP draft.

Steve and Colin will finalize RTP advancement request and interop statement soon after the meeting when the the revised drafts are posted, and then submit the package to the IESG.

RTP through NATs and firewalls

The next presentation was by Jonathan Rosenberg. He said that SIP usage of RTP will fail when there is a NAT between the two parties because addresses and ports are carried in the SDP packets exchanged during the call establishment phase. Furthermore, SIP's usage of RTP is not firewall-friendly either because there is no static policy for port usage that can be set by a firewall administrator to let RTP packets in and out of a network behind a firewall. Two changes regarding the use of ports in RTP are proposed to deal with these problems, as described below. A third proposal, to allow RTP and RTCP to be multiplexed on the same port, was discussed at length on the mailing list, but Jonathan has concluded that the issues involved in implementing such a change are probably not worth the expected gain.

The first proposed change is to relax the requirement that the RTP and RTCP for a particular session be transmitted on an even-odd pair of ports. This requirement will be broken by port mapping through a NAT because even though the RTP client on the internal side of the NAT uses an even-odd port pair, the NAT may map these ports to external port numbers arbitrarily. The proposal is to allow for RTP and RTCP to use non-consecutive ports. Steve Casner stated that it is reasonable to relax the RTP spec in this regard, but not to strike the even-odd port requirement entirely. He proposed a revision to say that if only one port is specified and the other port is implicit then the text as it stands applies. If both ports are explicitly specified by some signaling protocol, then they are not restricted. Jonathan agreed with this, and mentioned that Christian Huitema will propose in the MMUSIC WG a corresponding extension to SDP so that the non-consecutive RTCP port number can be explicitly signaled. Christian asked about the rule that if an odd port number is specified for RTP that it is to be reduced by one for the RTP port and the odd number used for RTCP. Steve responded that this rule is applicable only if just one of the port numbers is specified and the other one is implicit, so the current text still applies. There was rough consensus to put Steve's suggested wording into the RTP spec.

Christian asked what happens if the RTP and RTCP ports are explicitly specified to be the same. Marshall Eubanks suggested to put a constraint in the RTP spec to disallow this. A concern was that a symmetric NAT may always return the same port. Considering the wide variations in NAT implementations, it was agreed that such possibilities might exist but are not expected to be common.

Jonathan described how non-consecutive port numbers are also required for his proposal for "symmetric RTP" in SIP. In this scheme, the active participant in the connection establishment will connect to the explicit ports of the passive participant, but the passive participant will send packets back to the source ports of the active participant so those packets will be allowed to flow through a symmetric NAT. This scheme does not require any further change in RTP itself.

The second change Jonathan proposed was to allow multiple RTP sessions to share the same pair of (well-known) port numbers to avoid the requirement to open up a firewall for dynamically allocated port numbers, which firewall administrators consider unacceptable. The proposal is to use the SSRC for demultiplexing as proposed by Robert Fairlie-Cuninghame and discussed on the mailing list. Jonathan asked if this is reasonable. Steve noted that the problems with using SSRC for multiplexing are explained in detail in the RTP spec, and stated that his personal opinion is that this is too much of a fundamental part of the RTP structure to break for this purpose. Dave Oran quoted "path to hell is paved with good intentions" as a meta comment. He said he agrees with the goodness of the intentions but, thinks that the path goes to hell. He suggested going back to promoting RTP to an IP protocol as a much cleaner solution. Christian stated that a better solution would be deploying IPv6 but agreed that making applications more complicated to make the life easier for NATs is not a good idea. MIDCOM should provide some solutions, but not soon. Several people expressed the concern that this is a vital issue and that without a solution wide scale use of RTP and, in particular, voice over RTP, will not be possible. Colin Perkins noted that discussions on this issue should continue on the list.

Datagram Control Protocol

Mark Handley gave a preview presentation of a new transport protocol proposal, DCP, which will be discussed in the TSV WG. DCP is relevant to AVT because it could be used to carry RTP. The idea of DCP is to provide a good, shared, kernel-based implementation of congestion control for applications that require flow-like unicast semantics but not absolute reliability. It uses a three way handshake for connection set up. It has explicit service identification in addition to source and destination ports. It has packet sequence numbers built-in and directly supports explicit congestion notification. There is a common packet header including a sequence number field which is larger than that of RTP. It contains TCP like congestion control, TCP friendly rate control and single window congestion control. DCP features an ACK vector which can provide packet-level feedback.

Mark asked for feedback from the group about whether the feature set makes sense for RTP over DCP. Steve mentioned that if DCP becomes a reality, the right approach might be to define RTP version 3 to be used over DCP. Dave Oran compared DCP with SCTP and said there are overlaps and these need to be studied to understand if SCTP can be made to cover everything or if the overlaps should be eliminated and two new protocols developed. Steve mentioned that these questions will be discussed in the TSV WG.

RTCP Extension for Source Specific Multicast Sessions

Julian Chesterfield presented draft-chesterfield-avt-rtcpssm-01.txt on unicast feedback of RTCP reports originally motivated by the requirements of RTCP usage in SSM sessions. In this revision, the draft is extended to cover any generic single source session. Some of the security issues are addressed, but this is not complete yet and feedback is solicited. A dual feedback method is proposed. In the simple method, the source reflects the receiver reports to all receivers unmodified. In the packet summary method, all the RTCP information is summarized in a new summary packet format providing for some bandwidth savings. The summary packet conveys average packet loss and jitter information along with some information about the statistical distribution, plus group size and average RTCP packet size information so that receivers can still perform the reporting interval calculation. Steve Casner mentioned that the cumulative number of packets lost is only useful when used in conjunction with the extended highest sequence number. He also observed that further analysis may be needed to verify that chosen statistics will be meaningful to the receivers.

The proposal addresses dynamically changing the feedback type during a session. Colin Perkins noted that this will add a lot of complexity and probably shouldn't be done. The draft does mention that mixed groups using both formats could operate through a translator. There may be problems with additional sources joining in and summarizing hosts disappearing.

The draft specifies basic security checks but the authors are unsure about the extent they to which they should be addressing these issues. Colin stated that at least all the issues needs to be explained. The main issue is when a false source address is specified, a receiver sends packet to an unsuspecting source. Steve said certain mechanisms may be simplified by restricting this to SSM, and suggested concentrating on that case. Christian Huitema asked if the mechanisms in the paper by Turletti and Wakeman on scalable feedback control for multicast group published six years ago addressing the same problem have been considered. Issues to be addressed in the next revision include the security issues and SDP extensions.

Secure RTP (SRTP)

The next presentation by Elisabetta Carrara was an update on the state of secure RTP (draft-ietf-avt-srtp-01.txt). It is a security protocol defined as a new profile RTP/SAVP to provide confidentiality, integrity and header authentication, including limited authentication when full authentication is not affordable. The design goals are low computational cost, small footprint, limited packet expansion, and no error propagation. IP/UDP/RTP headers are left unencrypted so that they can be compressed. In the new version, source origin authentication is added and formation of compound authentication is allowed.

More explanation was added on how to determine the crypto context (mapping the keys to the packets). There are two ways. The implicit mapping uses the existing fields e.g., SSRC, timestamp, etc., so it does not expand the packet. But this is not a robust approach, so an optional explicit 32-bit SPI field was added at the end of the RTP packet to allow very fast re-keying in a large group. Colin Perkins commented about a previous request for hiding the authentication tags in the padding bits to allow backward compatibility for receivers that don't know about authentication. Steve Casner said having SPI to be explicit is fine because this is a new profile, but this needs to be clearly signaled.

A problem exists when it is desired to share the same key for all RTP sessions in a multimedia session because SSRC may not be unique across the RTP sessions. The solution requires SSRC to be globally unique across those RTP sessions or to use a unique key for each sender.

Source origin authentication was added, per requests from the previous meeting, using TESLA delayed authentication. There are clock synchronization and buffering problems with this, but one of the TESLA co-authors clarified that TESLA can use buffering just at the sender. Several people expressed concern about the timeline for TESLA's development because there are applications anxiously awaiting SRTP. TESLA is said to be implemented and working fine. It is an IRTF draft that will transition to an IETF draft in MSEC in a couple of months. SRTP may be ready for last call sooner. TESLA was specified in SRTP because it is one of the most stable delayed authentication scheme, but use of other schemes is also allowed. Elisabetta said delayed authentication could be removed. Colin responded it must either be a complete specification or be removed.

Colin asked: how do you encrypt the RTCP packets? Encryption should be applied to the compound RTCP packet only. Colin suggested to clarify the examples to reflect this. Other parts of the draft to be improved include the SDP attributes section and FEC interactions.

Anders Klemets asked why SRTP needs to be a new RTP profile because he is concerned about the need to combine the functions of multiple profiles even though operation only under a single profile is allowed. Steve responded that SRTP requires some different rules such as global uniqueness of SSRC and the presence of the SPI trailer. He claimed that the number of profiles is not expected to be large, and that the useful combinations should be verified to interact properly and then given a new name as a combined profile.

SRTP is independent of key management, but updating the crypto context is tricky. A section has been added to specify what SRTP requires from key management. SDP attributes are defined for this. Joerg Ott asked for a presentation of this work in MMUSIC WG.

The SRTP authors have been requested to add a key management protocol, so there are two new drafts on key management which will be discussed in the MSEC WG. Dave Oran noted that the signaling protocols need to address secure key distribution, and Jonathan Rosenberg asked why aren't these SDP extensions. Both expressed concern about getting involved with the complexities of multicast keying and suggested that the unicast problem is much easier to solve and needs to be tackled first and quickly so that SRTP users don't invent their own non-standard security techniques. The response was that as far as key management is concerned, multicast and unicast are not much different. The main issue is dynamic membership and the techniques work without that.

Update of RFC 2250

The second session of the AVT WG started with Reha Civanlar's presentation on the collection of requests for modifications to RFC 2250 "RTP Payload Format for MPEG1/MPEG2 Video" since its publication in January 1998. There are a few requests that make minor functional changes to the payload format, and others that request clarifications of the text or changes to the document structure.

The first request is to change the reference for the MPEG standard in RFC 2250 to a more restrictive reference defined by ATSC because ATSC and SMPTE were considering using MP2T, but they need a more constrained definition. Michael Dolan made this request about a year ago. The consensus was that the RFC 2250 reference should remain as it is, but that if this request is still active, the same payload format specification could be used with a dynamic payload type indicating use of the more restrictive ATSC reference. This can be implemented as a separate document to define a new MIME subtype that references both the RFC 2250 revision and the ATSC document.

The second issue in the first group was about the clock resolution used in MP2T payload type. RFC 2250 uses a 90 kHZ fixed clock frequency for all five payload formats it specifies. Humphrey Liu wanted to increase this to 27 MHz for improved jitter compensation when transporting high data rate multiprogram MP2T payloads, and also to constrain the bitrate changes to "piecewise CBR" or "relaxed piecewise CBR". It was decided in a previous AVT meeting that this request should be supported by using a new dynamic payload type. This payload type uses MP2T payload format but allows specification of the clock frequency through out-of-band means e.g., SDP. The use of this dynamic payload type will be described in the RFC 2250 revision.

The last functional change requested was related to the RTCP jitter computation. An MPEG video frame may need to be carried in more than one RTP packet. All such packets carry the same timestamp although they are sent at different times, and this causes the calculated jitter to be larger than the true network jitter. This problem exists for other payload formats, too, but it is particularly severe when B-frames are used in MPEG. The send times of both P- and B-frames are not related to their timestamps because a future P-frame must be sent prior to all B-frames that refer to it. Reha noted that there was a recent comment on the list stating that the jitter computation based on the existing definition may actually be useful for the receiver buffer size determination. He pointed out that the problem will be with the media unaware network elements which may try to determine network state parameters, such as congestion, based on jitter computations which will not reflect the network jitter in such cases.

Reha suggested that the MPEG payload format should specify a modified jitter computation that accurately reflects network jitter. Steve Casner asked for WG feedback on whether it is reasonable for a payload format to specify a change in the jitter behavior from what the RTP spec defines or whether the jitter computation in the RTP spec needs to be modified in some way. Steve said a concern is that the suggested approach requires significant knowledge about the datastream in order to do the jitter calculation and in many implementations this may be done in a payload independent portion of the implementation. Reha stated that the results computed this way will be wrong. Marshall Eubanks said it doesn't seem to be a good idea to require people to do something that is wrong. Steve reiterated his previous comment about the use of the jitter as a qualitative value to compare the performance of several servers or a single server over time. This is a significant issue that needs to be discussed further on the list.

The remaining requests are not functional changes to the payload formats. Three textual clarifications were briefly introduced: an explanation to be added into the Appendix and clarifications related to M-bit usage and the fragmentation rules. Reha asked everyone to check the details of the proposed wording changes which are in the viewgraphs included with the proceedings. Ross Finlayson requested that RFC 2250 be split into three separate RFCs for systems streams, video elementary streams, and audio elementary streams. Reha disagreed with the reasons for the split. There were no objections to keeping RFC 2250 as a single document. The last request was to prepare an informational RFC outlining the use and interrelations between the several payload formats for MPEG-1, -2 and -4. Steve said such a guideline would be very welcome if someone who knows about all these formats will volunteer to write it.

Extended RTP Profile for RTCP-based Feedback

Joerg Ott gave the next presentation on extending RTCP-based feedback for providing immediate responses. A new profile is proposed to extend the existing A/V profile but eliminate the five second minimum for the interval between RTCP packets sent by any recipient while still keeping within the overall RTCP bandwidth limits. This allows timely feedback for signals such as ACK, but it is a probabilistic mechanism and is not a substitute for reliable transport via TCP. This draft-ietf-avt-rtcp-feedback-00.txt is the merge, by request from the last meeting, of draft-wenger-avt-rtcp-feedback-02.txt and draft-fukunaga-low-delay-rtcp-02.txt into one new RTP profile named RTP/AVPF. The packet format structure is simplified to require fewer codepoints. Some SDP specs and an IANA considerations section were added, and the text was clarified. Many simulations are being run. Steve Casner thanked the authors for doing the merging work.

Joerg continued with the open issues. The first one is related to the definition of the maximum dither time, T_dither_max. This parameter is zero for a group of two, but scales linearly with the number of members and the RTT. In a very large group, this may result in very long waiting times. Henning Schulzrinne suggested that this problem might be the same as the one for which RTP timer reconsideration was introduced, but after some discussion it was determined that the problems are different. Christian Huitema said the feedback formula used for deciding when to send what should take the observed loss rate into account. Joerg concluded the T_dither_max issue by saying that the current formula works fine with synchronized losses, but statistically increases the delay for unsynchronized losses and they could not find a method to deal with this.

The second open issue was about many implementations not recognizing the new profile name and hence, not joining a session even if they would be able to participate in it. The solution may be to provide alternative specifications for both profiles in SDP using the fid attribute. Anders Klemets asked the value of using the name AVPF in the SDP description. Steve replied that it is necessary to make sure the set of participants are working together under the same set of rules in the new profile. Anders said this will make the draft depend on the fid draft. Joerg replied that the fid draft is becoming an RFC soon so it should not be a problem.

The main remaining work for this draft is the definition of the formula for calculating the response time. Joerg said that this will be feasible. Colin Perkins asked whether anything explicit needs to be done to cope with the changes in the group size. Joerg replied that at the time of transmission, regular RTCP reconsideration is to be performed. The second comment from Colin was that he was not particularly keen about the single RTCP type and subtypes which overloads the field used for report count in other RTCP packet types. Steve disagreed, since the APP packet uses a subtype there. Steve noted the importance of the results of the simulations for this work and invited publication of those results as an I-D and perhaps an RFC.

RTP retransmission framework

David Leon presented a new draft specifying an RTP retransmission framework. This framework does not address receiver feedback which is covered by the profile in the previous presentation. It is related only to retransmission of the data. In this approach, the original RTP stream is not modified. Retransmitted packets are sent in a separate stream, either using the FEC payload format in RFC 2733 or using the new retransmission payload format defined in this draft. Using the FEC payload format for retransmission is accomplished by setting the SNbase field of the FEC packet header to the retransmitted packet's SN and setting the mask to zero. Alternatively, the retransmission payload format is designed to be used when FEC is not supported at the receivers. For this, the retransmitted packet is encapsulated in another RTP packet. A retransmission stream can be sent to the same multicast group (or unicast port) as the original stream with a different SSRC or to a different multicast group (or unicast port) with the same SSRC. Not every receiver needs to subscribe to the retransmission multicast group.

Dave Oran asked why the RTP encapsulation is used since the RTP semantics is not used e.g., the timestamp is meaningless. Marshall Eubanks asked if the retransmission stream needs to have its own RTCP stream (yes). There was a comment on the problem with associating the SSRC with the retransmission stream when it is sent to the same multicast address. Colin Perkins asked why not send this to the same multicast group and SSRC with a different payload type. The second author, Viktor Varsa, responded that they wanted to keep it modular so that receivers who don't want to receive retransmissions can do so, but Carsten Bormann claimed that sending the retransmission stream on as a different stream has many drawbacks. Steve Casner mentioned that when the number of receivers are large, retransmission starts becoming ineffective. Dave and Colin noted that going to a separate port will essentially be the same as the RLM technique which is already specified, so FEC and RLM can be used to achieve the purpose of this draft and there is nothing to write. Viktor countered their framework allows layering, but Dave said that's what RLM is about.

David Leon continued the presentation with the details of the layered payload format which is defined so that a receiver can subscribe to only a subset of the layers and/or retransmission streams. This is adapted from "Layered audiovisual coding for multicast distribution on IP network" by Nilsson, M., Dalby, D., O'Donnell J., presented in Packet Video 2000. It generalizes to send a data stream as separate RTP streams with possibly different priorities. David presented the details of the proposed header for the layered payload format. Colin repeated his objection to the usage of the same SSRC in different groups, noting that streams are to be related by CNAME but not by SSRC (which may be affected by collision resolution). Dave Oran noted that he couldn't see in the draft whether this approach breaks SRTP or CRTP. In particular, the encapsulated packets won't get compression of the inner header.

Steve concluded saying that in order to demonstrate the validity of the approach, some simulations testing the questions need to be performed. As it stands, this work is just a theoretical idea that needs to be proven. Colin noted that there has been a lot of academic research in this area and that there should be more references to this work. From the comments in the meeting, it is clear that this proposal is not yet mature and would benefit from additional discussion.

RTP payload for phoneme/facial animation parameters

The next presentation was on a new RTP payload format for phoneme and facial animation parameter streams by Joern Ostermann. This work enables facial animation in an MPEG4 framework using a text to speech (TTS) synthesizer that is network based rather than local to a client as MPEG4 has assumed. Input to a speech synthesizer includes some mark-up information for non-speech related facial animations in addition to text. The mark-up information has transition time and amplitude parameters and these are used in an additive manner. While losing the text input to a speech synthesizer causes loss of some words at the output, losing a facial animation parameter may result in an unintentional facial expression lasting a long time.

Use of a network-based TTS server is motivated by new technology for features such as custom voices which requires very large databases. The input to the network based TTS server is an MPEG4 stream and one of the two outputs is an audio stream; both of these have associated RTP payload formats already defined. However, the second output of the TTS server contains the phonemes and mark-up information for facial animation which require this new payload format. The RTP payload consists of a packet descriptor followed by optional recovery information and optional phoneme and/or facial animation parameter information. The packet descriptor tells if there is recovery information included, and if so, what type. The recovery information may be dynamic or complete. Complete recovery is a complete state update while dynamic recovery covers the facial animation parameter information carried in a limited number of previous packets. There is no recovery information for the phonemes. The internal TTS interface specified in the MPEG standard defines the phoneme descriptors and the FAP descriptors plus value ranges for some of the parameters.

Steve Casner asked whether this payload format specifies some reformatting of MPEG4 data and redundancy information extracted from the data and put into the stream. Joern replied that MPEG4 does not specify this as a data stream but only an internal interface. Steve said the problem with specifying this as an AVT WG item is that this almost like defining an audio encoding format which is outside the WG domain. Reha Civanlar noted that the API is well defined in the MPEG4 standard and no new coding is defined in the payload format. Dave Oran asked if the authors had considered putting the recovery information in an FEC stream instead of multiplexing it in the payload. Steve noted that specifying the recovery information in a manner optimized for this application may be more efficient than using a generic method. Steve asked for a hum from the participants on whether this proposal should be accepted as a WG item. The answer was positive.

RTP Payload Format for Time-lined Static Media

Magnus Westerlund presented another new payload format proposal, this one a generic RTP payload format for time-lined static media. He defined timelined static media as static media objects with inherent timeline and temporal properties. The application examples include subtitling, slideshows, ID3 information for Internet radio and many others. Currently, timelined static media is being carried by a mix of various protocols and suffers from several problems. The suggested solution is to use RTP.

Magnus listed the requirements for the payload format: include timing information, be generic, support ALF, be error resilient and support multicast. The proposal is to create a framework containing building blocks that can be referenced by profiles of the payload format for particular applications. Magnus noted that the framework needs to be improved and that a new RTP profile may be required to fit this payload format. Philippe Gentric noted that MPEG4 considered the same problem and decided to model everything as continuous media. Colin Perkins noted that this is getting very close to reliable multicast with a lot of overlap. Rob Lanphier noted the similarity with the "RealPix" model. Steve Casner expressed concern that this could become a monster. He suggested that this needs to go through another round of refinement before adopting it as a WG item. Magnus agreed.

RTP Payload Format for Distributed Speech Recognition

The next presentation was on distributed speech recognition (DSR) by Qiaobing Xie. The motivation for distributed speech recognition is that high-grade speech recognizers are too complex for small clients and that speech recognition is difficult if compressed speech is transmitted from the client. Instead, the DSR codec in the thin client extracts a feature vector and transports it to the recognition engine over RTP. The codec work has been going on outside of IETF for a very long time and is stable. The interface between the recognizer functions is defined by ETSI, but this work has focused on circuit-switched networks. 3GPP wants to add DSR capability to packet switched services.

A first attempt at defining this payload format was presented more than half a year ago and the draft has expired. The changes in the new version are simplified payload headers and support for future front-end types. A one-byte payload-specific header contains a frame pair count (FPC) and an end of speech segment flag. Steve Casner noted that given that the size of the frame pairs is a known constant, a counter for them may not be necessary. Dave Oran asked the reason for not using the RTP marker bit as the end of speech flag. Steve noted that this is a particular type of audio and the convention is to use the marker bit to indicate the beginning of a talkspurt. Colin Perkins said that if the end flag is not really needed the payload specific header could completely disappear. Christian Huitema asked if there was IPR related to this payload format; the answer was no. Dave Oran mentioned that speech recognition experts reported a need to know whether the speech is echo canceled and this may be a parameter for the MIME type. This may be a part of the front end. Qiaobing will check this. Steve noted that the group decided to adopt this work as a WG item when the first version of the payload format was presented.

RTP Payload Format for EVRC Speech

Peter McCann gave a brief status update on EVRC, standing in for Adam Li. There were eight changes since the last meeting. One of the two major changes is the use of a TOC format where frame type information is placed up front. This eases the use of variable-protection techniques that can protect the formatting information differently from the data. The second big change was about combining interleaving and bundling into the same frame. There will be one more version with minor edits, then the draft will be ready for the last call. There were some discussions about use of the reduced rate bit but no changes. Christian Huitema asked about the relation between this and ECN. Steve noted that this is not a payload format question and asked the group's opinion for going to a last call: yes.

RTP Payload Formats for MPEG-4

The next presentation was by Philippe Gentric on the generic MPEG4 payload format draft-ietf-avt-mpeg4-multisl-01.txt. He noted that although the version number on the draft is -01, the document has been edited under various names more than ten times by now. There were three minor changes regarding the payload size field in the multiple-SL mode, the DTS coding reference, and the CTS and DTS flags. In the MIME definition, the name is changed to mpeg4-generic, several parameter names are simplified, and three new parameters are added. The new parameters indicate "stream type", mode, and profile. Since this payload format has a very wide scope, the profile and mode parameters serve to restrict it. As an example, RFC 3016 video can be considered as a mode of this format. Philippe listed various editorial changes to be done next, including trying to find correct wording to describe interleaving since it can be done in many ways.

Philippe stated that this work is very mature now. It has been reviewed many times and has been stable for a while. Also, several implementations exist. He asked for the possibility to go to a WG last call. Steve Casner noted that two or three things still need to be changed and the draft as it stands can't go to a last call without these changes.

There was a question on the relation between this work and the next presentation. Philippe said this work is a general one but there is a need for defining simpler subset configurations. The next work defines such a subset.

Jan van der Meer gave a presentation on simplified transport of MPEG-4 ES with no sync layer (SL) in draft-vandermeer-mpeg-4-simple-00.txt. This specification was motivated by considerations of the Internet Streaming Media Alliance (ISMA) for MPEG4 streaming by applications that do not use SL. The MPEG4 generic payload format is usable, but it is hard to understand for people who are not familiar with SL. Moreover, some of the functionality is not required by ISMA which needs only three configurations initially. The approach is to publish this draft which is essentially an implementers agreement based on a subset of the generic payload format without SL and without addressing the transcoding issues from/to SL. The described use includes all the features of the generic format except interleaving and fragmentation of access units. In order to make this specification self contained, some optional parameters of the generic format are required to be specified. MPEG4 object descriptor framework knowledge is not needed for this restricted use. Three specific configurations are defined for CELP and AAC audio. Originally different MIME subtype names were proposed for each, but now a single MIME subtype name "mpeg4-generic" is proposed, with parameters to identify the specific configurations. Steve Casner said this approach seems to be acceptable and solicited comments from MIME experts. Jan concluded his presentation by noting that since this work is an application statement of MPEG4-generic, its maturity is comparable. Interoperability testing is currently taking place and there is industry pressure for rapid processing of this work. Steve stated that since this draft is an application statement for the generic payload format rather than a separate new payload format, the appropriate document status might be Best Current Practice rather than Proposed Standard.

Steve observed that the main open issue regarding the MPEG4 payload formats relates to the use of MIME subtype names. We want to avoid a proliferation of names and conflicts among names. Philippe reported that there were a number of comments from the MPEG Committee on this. One comment was to use same subtype name as "MPEG4-generic" and to define mode parameters to identify subset and configurations. Another comment was to use the configurations independent of the stream type. He mentioned that MPEG4 has an initial object descriptor (IOD) which is to be carried in SDP. The IOD specifies two streams: binary information for scene (BIFS) and object descriptor (OD). These need to be received before starting a session. There are applications where OD and BIFS are simple and static. These could be put in SDP or URLs, but this requires new media types.

Steve asked if the full generic MPEG4 payload format was too complicated to be used as it is, and therefore would not need a name for itself. He is concerned that there not be multiple names for the same bits on the wire. Philippe said Jan accomplished writing a much simpler document, but mentioned cases where the generic format needs to be used. Steve asked whether there is a MIME type conflict with RFC 3016 which also defines a similar subset. Philippe said RFC 3016 audio is different with a different name, but video overlaps. Philippe mentioned that MPEG4-generic video without any parameters defaults to RFC 3016 video. However, the latter has a different name. There was a question on which name to use for the implementations. Philippe said the bits on the wire will be the same, but implementations will need to know both names. One last comment was a request to see more examples for the usage of these drafts before going to a last call, including the data URL example Philippe showed in his slides.

RTP payload format for Erasure-Resilient Progressive media

The final presentation of the meeting was on draft-ietf-avt-uxp-00.txt by Marcel Wagner. This method provides protection of dependencies within and between frames and it can easily adapt packet sizes to network requirements. The new draft uses in-band signaling of the profiles which allows changing them. With the new draft, streams without data partitioning can also be used. A real-time implementation will be made available soon. The working group's feedback was solicited.

Philippe asked if this is intended for extremely low bit rate video because of the frame interleaving. The answer was no, with higher rates the frames may be sliced and slices interleaved. Steve Casner asked if there were changes since the last presentation. Marcel said the changes are only for signaling and they are already included in the draft. Steve mentioned that we have several mechanisms for FEC with different trade-offs and it is ok to publish this as an RFC in addition to the others. Colin Perkins noted that this draft needs to include a clear applicability statement for when it is appropriate vs. when other schemes would be more appropriate.

There was a question on the status of ULP. It is up for the last call. Steve concluded the meeting by noting that we ran over our time again and said that for the next meeting we need to decide not cover some of the topics and handle them on the list.


An RTP Payload Format for Erasure-Resilient Transmission of Progressive Multimedia Streams
Secure RTP
Generic RTP Payload format for Time-lined Static Media
RTCP Extension for Single Source Multicast
RTCP-based Feedback: Concepts & Message Timing Rules
RTP Payload Format for MPEG1/MPEG2 Video
Principles of MP4
Use cases of ‘MPEG4-generic’ RTP payload format
MPEG-4 generic (Synch-Layer) RTP payload format
An RTP Payload Format for EVRC Speech
RTP Payload Format for Distributed Speech Recognition (DSR)
RTP Retransmission framework
Datagram Control Protocol (DCP)