Current Meeting Report

2.8.1 Audio/Video Transport (avt)

In addition to this official charter maintained by the IETF Secretariat, there is additional information about this working group on the Web at: -- RTP FAQ Page
NOTE: This charter is a snapshot of the 54th IETF Meeting in Yokohama, Japan. It may now be out-of-date.

Last Modifield: 07/03/2002

Stephen Casner <>
Colin Perkins <>
Transport Area Director(s):
Scott Bradner <>
A. Mankin <>
Transport Area Advisor:
A. Mankin <>
Mailing Lists:
General Discussion:
To Subscribe:
Description of Working Group:
The Audio/Video Transport Working Group was formed to specify a protocol for real-time transmission of audio and video over UDP and IP multicast. This is the Real-time Transport Protocol, RTP, together with its associated profile for audio/video conferences and payload format documents.

The current goals of the working group are to revise the main RTP specification and the RTP profile ready for advancement to draft standard stage (including the sampling algorithms for use with very large groups, which have been broken out into a separate document), to complete the RTP MIB, to produce a guidelines document for future developers of payload formats and to continue development of new payload formats.

The payload formats currently under discussion include a number of media specific formats (MPEG-4, DTMF, PureVoice) and FEC techniques applicable to multiple formats (parity FEC, Reed-Solomon coding).

Archive before July 2001:

Goals and Milestones:
Done  Working group last call on guidelines for payload format writers (BCP)
Done  Working group last call on parity FEC draft (standards track)
Done  Post revised RTP MIB and issue working group last call (stds track)
Done  Post revised DTMF payload format draft, ready for WG last call
Done  Post RTP implementation checklist draft
Done  Post revised RTP spec and audio/video profile
Done  Post payload format for MPEG-4 based on MPEG/IETF joint meetings
Done  Post revised draft on PureVoice (qcelp) payload format to address WG last call comments
Done  Post revised RTP membership (SSRC) sampling draft
Done  Submit RTP MIB to IESG for publication as Proposed Standard RFC
Done  Submit guidelines for payload format writers for publication as a BCP
Done  New working group last call on PureVoice payload format
Done  Working group last call on revised SSRC sampling draft (experimental)
Done  Post final revision of RTP spec and A/V profile drafts
Done  Analysis/simulation of multiplexing payload format proposals
Done  Revise MPEG-4 payload format document after implementation experience
Done  Decide how to proceed with multiplexing protocol: one generic payload format or a number of application specific formats
Done  Working group last call on RTP and A/V profile (for Draft Standard)
Done  Prepare MPEG4 implementation results ready for WG last call
Done  Post final revisions of selected multiplexing protocol draft(s)
Done  Working group last call on multiplexing payload format (stds track)
  • - draft-ietf-avt-profile-new-12.txt
  • - draft-ietf-avt-rtp-new-11.txt
  • - draft-ietf-avt-rtp-mime-06.txt
  • - draft-ietf-avt-rtcp-bw-05.txt
  • - draft-ietf-avt-rtp-cn-06.txt
  • - draft-ietf-avt-tcrtp-06.txt
  • - draft-ietf-avt-smpte292-video-06.txt
  • - draft-ietf-avt-crtp-enhance-04.txt
  • - draft-ietf-avt-ulp-05.txt
  • - draft-ietf-avt-rtp-selret-05.txt
  • - draft-ietf-avt-srtp-05.txt
  • - draft-ietf-avt-uxp-03.txt
  • - draft-ietf-avt-mpeg4-multisl-04.txt
  • - draft-ietf-avt-rtcp-feedback-03.txt
  • - draft-ietf-avt-mpeg4-simple-04.txt
  • - draft-ietf-avt-dsr-01.txt
  • - draft-ietf-avt-evrc-smv-03.txt
  • - draft-ietf-avt-mwpp-midi-rtp-04.txt
  • - draft-ietf-avt-rtcpssm-01.txt
  • - draft-ietf-avt-rtp-retransmission-02.txt
  • - draft-ietf-avt-rtp-interleave-00.txt
  • - draft-ietf-avt-rtp-jpeg2000-01.txt
  • - draft-ietf-avt-rfc2833bis-00.txt
  • Request For Comments:
    RFC1889 PS RTP: A Transport Protocol for Real-Time Applications
    RFC1890 PS RTP Profile for Audio and Video Conferences with Minimal Control
    RFC2035 PS RTP Payload Format for JPEG-compressed Video
    RFC2032 PS RTP payload format for H.261 video streams
    RFC2029 PS RTP Payload Format of Sun's CellB Video Encoding
    RFC2038 PS RTP Payload Format for MPEG1/MPEG2 Video
    RFC2190 PS RTP Payload Format for H.263 Video Streams
    RFC2198 PS RTP Payload for Redundant Audio Data
    RFC2250 PS RTP Payload Format for MPEG1/MPEG2 Video
    RFC2343 E RTP Payload Format for Bundled MPEG
    RFC2354 I Options for Repair of Streaming Media
    RFC2429 PS RTP Payload Format for the 1998 Version of ITU-T Rec. H.263 Video (H.263+)
    RFC2431 PS RTP Payload Format for BT.656 Video Encoding
    RFC2435 PS RTP Payload Format for JPEG-compressed Video
    RFC2508 PS Compressing IP/UDP/RTP Headers for Low-Speed Serial Links
    RFC2733 PS An RTP Payload Format for Generic Forward Error Correction
    RFC2736BCPGuidelines for Writers of RTP Payload Format Specifications
    RFC2762 E Sampling of the Group Membership in RTP
    RFC2833 PS RTP Payload for DTMF Digits, Telephony Tones and Telephony Signals
    RFC2793 PS RTP Payload for Text Conversation
    RFC2862 PS RTP Payload Format for Real-Time Pointers
    RFC2959 PS Real-Time Transport Protocol Management Information Base
    RFC3009 PS Registration of parityfec MIME types
    RFC3016 PS RTP payload format for MPEG-4 Audio/Visual streams
    RFC3047 PS RTP Payload Format for ITU-T Recommendation G.722.1
    RFC3119 PS A More Loss-Tolerant RTP Payload Format for MP3 Audio
    RFC3158 I RTP Testing Strategies
    RFC3190 PS RTP Payload Format for 12-bit DAT, 20- and 24-bit Linear Sampled Audio
    RFC3189 PS RTP Payload Format for DV Format Video
    RFC3267 PS RTP payload format and file storage format for the Adoptive Multi-Rate (AMR) and Adaptive Multi-Rate Wideband (AMR-WB) audio codecs

    Current Meeting Report

    Audio/Video Transport Working Group Minutes

    Reported by Colin Perkins and Stephen Casner

    The Audio/Video Transport working group met twice at the 54th IETF meeting in Yokohama. In the first session, the group discussed RTP payload formats for MIDI, Interleaved audio, AC-3 audio, JPEG 2000 video, JVT video and MPEG-4. In the second session, the discussion focused on RTP payload formats for iLBC speech, SMPTE 292M video and uncompressed video, extended RTCP reports, RTP retransmission, multiplexing based on the SSRC, and RTCP extensions to support single source multicast sessions.

    Introduction and Document Status

    The meeting opened with a status update from Steve Casner. The group has had a single RFC published since the last meeting: the RTP payload format for AMR audio (RFC 3267). There are also several drafts in the RFC editor queue awaiting publication: the MIME registration for the payload formats in the RTP profile, the SDP bandwidth modifiers for RTCP bandwidth, and the RTP payload format for comfort noise. In addition, the revised RTP specification and audio/video profile have been "tentatively approved" for publication since the last meeting.

    Two issues have been raised with the drafts approved for publication: a conflict between the G.726 payload format defined in the new profile and that in ITU I.336.2 Annex E, and a desire for a minimum transmission interval for comfort noise packets.

    The payload format for G.726 audio defined in the audio/video profile is little-endian, however ITU I.336.2 Annex E specifies a big-endian format for the same codec. It has been requested that the audio/video profile be changed to match the ITU format. Steve Casner asked the group if it is reasonable to make this change, since the definition in the profile has been present since 1997, and there are existing implementations? He noted that it is clearly unfortunate that there is an incompatibility with the ITU format, and that there are several possible ways to move forward. These include accepting the incompatibility, changing the definition of the payload format in the audio/video profile, or accepting both formats with the ITU format being registered under a separate MIME type.

    Dave Oran suggested a fourth option, which is to "take the IETF format and declare that to be the second MIME type" with the ITU format taking the place of the current definition, noting that "we get to decide who's ox gets gored". Steve Casner thought this was not reasonable, since such a change will break compatibility with implementations which use the existing MIME type, and noted that if we accept both formats, we need to assign a new MIME type to the ITU format. Dave Oran noted that whatever happens there will be some breakage, since several implementations use the ITU packing but refer to it with the IETF MIME type. There was no resolution, comments are solicited.

    The second issue is the possibility of defining a maximum inter-packet transmission interval for comfort noise packets, to act as a liveness indicator. This raises two new questions: what should the interval be, and should it be specified in the payload format rather than some application-specific document? Steve Casner expressed the opinion that such a restriction was application-specific, and should not be in the payload format. He also noted that RTCP was a good indicator of system liveness. Input was solicited from the group, but there were no comments.

    Several drafts have been submitted to the IESG, but are not yet accepted for publication. These include enhanced CRTP and TCRTP, the secure RTP profile, and the payload format for EVRC/SMV speech. Several drafts are in working group last call: RTCP feedback, the MPEG-4 payload format, the payload format for distributed speech recognition, and the ULP and UXP FEC mechanisms.

    Regarding the ULP draft, Steve Casner noted that the behaviour of the X and P bits in ULP packets did not follow the usual rules (these bits are the XOR of the bits in the protected packets). It turns out that this behaviour was inherited from RFC 2733, which ULP extends, and implies that RTP header validation must be special-cased for FEC packets. It was noted that this is not a good design, and that neither the chairs nor the author of RFC 2733 could remember a reason why these fields were redesigned (except, perhaps, to save a couple of bits). Accordingly, Steve Casner proposed to redesign the ULP format to address this issue, as a replacement to RFC 2733. It is believed that there are implementations of RFC 2733 which might be affected by such a change, and input from implementors is solicited.


    There is a new draft on the payload format for DTMF tones to update RFC 2833 (draft-ietf-avt-rfc2833bis-00.txt). This adds several tones and events that were missed, and clarifies a small number of points. This revision is intended as a short-term effort with the goal of producing a draft standard RFC with minimal changes. Interoperability tests will be required, to complete this work, and a volunteer was solicited to coordinate interoperability testing.


    Colin Perkins, sitting in for John Lazzaro, gave an update on the MIDI Wire Protocol Packetization (draft-ietf-avt-mwpp-midi-rtp-04.txt). This draft has been discussed on the list, and is believed to be essentially complete in terms of the payload format and recovery journal structures. The next steps are to reality test the SDP parameters, updating them if necessary, and to write drafts describing how the recovery journal can be used for particular scenarios (intended as BCPs to accompany the main specification). Steve Casner asked if the aim of publishing these drafts as BCPs was to provide a means of publishing that information whilst satisfying concerns about including it in the main spec? Yes, it's not appropriate to put a single algorithm in the payload format, since there may be multiple algorithms that may be suitable, depending on how the format is used and on the desired degree of resilience.

    Colin noted that work is proceeding to check the format for correctness and update the implementation. In addition, companion drafts are planned to describe complete systems using MIDI with RTSP and SIP, fleshing out the complete scenarios.

    The issue of the definition of an IANA registry for render parameters, was highlighted: who controls the definition of new values? What sort of specification should be required for new values? Colin Perkins suggested that requiring an RFC for each parameter is probably overkill, since new parameters are expected to be common, but that a stable and public specification is appropriate. Colin also pointed to the summary of open issues that was sent to the mailing list, and solicited input.

    Interleaved Audio

    Colin Perkins, sitting in for Orion Hodson, introduced an RTP payload format for interleaved audio (draft-ietf-avt-rtp-interleave-00.txt). There are several existing payload formats that support interleaving; the intention of this draft is to produce a general purpose solution rather than having separate, subtly different, interleavers for each new codec. This new draft is relatively low overhead, works with audio codecs with fixed or self-describing frame sizes, supports codec changes mid-stream and codecs that employ silence suppression, and is reasonably easy to implement. The proposed payload format uses a two octet header indicating the interleaver cycle and index, plus the original payload type, in much the style of the RFC 2198 redundant audio format. The RTP timestamp is specified to use the timestamp of the frames pre-interleaver, to keep the sequence and allow header compression.

    Stephan Wenger noted that this format is useful for more than audio, since it works for anything with a fixed frame rate and size, and it could be useful for other situations.

    Steve Casner noted that he doesn't particularly like the idea, because of the overhead of carrying the additional payload type in each packet, plus the notion that we're adding an additional layer of indirection to hide the original payload type. It may still be reasonable to define an interleaving payload format, but the efficiency gains of not including the payload type in-band may make it worth defining codec-specific formats also. Colin Perkins noted that it may be possible to signal the inner-payload type out of band, as a way of reducing the overhead.

    Magnus Westerlund noted that there are potential issues with comfort noise, as discussed on the mailing list. Colin Perkins noted that operation with silence suppression may also not be well specified.

    JPEG 2000 video

    Eric Edwards presented the RTP payload format for JPEG 2000 video streams (draft-ietf-avt-rtp-jpeg2000-01.txt). There have been a number of changes in the draft since the last meeting, as a result of the feedback received from the IETF and the JPEG committee.

    The number of RTP packet types has been reduced, with the opaque type field in the payload header being changed to a set of explicit flags. This was queried by Steve Casner, who noted that the change does not reduce the number of modes of operation, it just represents them in a different way: his concern was more about the number of modes and the complexity they introduced. Eric noted that the types map directly onto the codec, and hence the authors believe the set of flags is appropriate.

    Support for tiling small sized parts has been specified, to improve efficiency with certain classes of operation.

    A number of optional fields have been added to support the addition of JPIP at some time in the future. Steve Casner and Colin Perkins expressed concern over this, since it is not clear if JPIP is suitable for use with RTP. It may be more appropriate to extend the protocol at a later date, rather than to add fields now in the hope that they are suitable. Steve Casner noted that having an undefined field in the standard is a problem: a new RFC, registering this option with IANA, may be better. This payload could define the extensibility mechanism in the IANA considerations section, but leave actual extensions for future specification.

    At the previous meeting, it was suggested that the authors investigate the H.263 picture header redundancy technique (RFC 2428) as a possible means of improving the resilience of this format. Eric reported that, because of the possible size of the Main Header of JPEG 2000, the authors believe this not appropriate. Steve Casner noted that the real issue may be that having a large amount of state which needs to be maintained makes the codec more fragile (since, unlike H.263, we can't repeat it often).

    Eric noted that there is an optional marker segment that can be used to help resilience, so this fragility is not necessarily a problem. Stephan Wenger noted that the RFC 2429 repetition feature allows repetition of parts of the header, if that is useful. Eric suggested giving examples of resilience using the optional header. Philippe Gentric asked if SDP might be an appropriate means of conveying this information, but Steve Casner noted that this is only appropriate if the header is static for the entire session.

    At the previous meeting, the redundant audio scheme from RFC 2198 was also proposed as a resilience mechanism, but the authors did not find that appropriate either.

    It was also suggested that careful ordering of packets might result in a more robust transport, since errors could be concealed by careful choice of update order. This is possible, and shouldn't require changes to the payload format, but it does requires additional buffering at the receiver.

    Eric noted that a patent application has been filed in Japan that covers this format. If the patent is granted, it will be licensed under reasonable and non-discriminatory terms. Steve Casner noted that the IPR statement needs to go on the IETF website, rather than in the drafts, and there is specific wording that should be included in the draft.

    There are a number of open issues: should support for in-band priority mapping tables by included in the specification? Steve Casner asked who would look at it? Is the goal to have the network do something different? He noted that there is no point putting information in the packets unless it's going to be useful. "If you're not sure if you'll need it, don't put it in".

    Eric noted that the authors have an implementation of the format, being used for testing. They will produce one more version of the draft before the next meeting, and they expect that to be ready for last call.

    AC-3 audio

    Jason Flaks presented the RTP payload format for the AC-3 audio codec (draft-flaks-avt-rtp-ac3-02.txt). There have been significant changes since the last version, primarily to improve the fragmentation and error resilience (this is important, since most AC-3 frames exceed the Ethernet MTU).

    Fragmentation has been improved by noting that the first 5/8ths of an AC-3 frame are independently decodable. This provides a natural fragmentation point, which is resilient to packet loss, and is now supported by the payload format. This new fragmentation scheme also gives the opportunity for redundant transmission of fragments, by sending a channel-reduced version of the data in the following packet. Colin Perkins suggested that delaying the redundant data by more than one packet might improve performance in the presence of burst losses, and so might be appropriate to consider.

    Jason noted that the number of data units field was added in case it was useful, but it was unlikely that is will be used. Steve noted that there are networks with large MTUs, but the question is more whether the packet rate and header overheads are a problem? Aggregation is useful when you can tolerate the latency and wish to reduce the packet rate, if that's not the case, there is no need to aggregate multiple frames per packet.

    After outlining the changes, Jason asked that the draft move to the standards track at some stage. Steve Casner noted that the draft was already accepted as a working group task (the name didn't change this time, due to the deadline). Advancing the draft is simply a matter of completing the work, at which time it can advance to RFC status.

    JVT video

    Stephan Wenger updated the group on the RTP payload format for JVT video (draft-wenger-avt-rtp-jvt-01.txt). There have been a number of changes since the last meeting, including making the RTP timestamp match the presentation timestamp, using a fixed 90kHz clock, using two types of aggregation packets (STAP and MTAP). The JVT spec itself has many changes in the video coding layer, a new "disposable" flag for packets, and the introduction of a picture layer (it was noted that the picture layer is controversial, and may be removed in future), and the draft has been updated to track these. There are several open issues: efficiency of MTAPs? Is a 16 bit timestamp offset in the MTAP sufficient? Is it appropriate for the RTP marker bit to represent end of slice? (There is also the issue of possible alignment with the MPEG-4 payload format.)

    Stephan noted the issue of IPR on the parameter set concept, raised by Reha Civanlar on the mailing list.

    Regarding the 16 bit timestamp offset, Philippe Gentric noted that it is "both not enough and too much" and should be configurable. Colin Perkins commented that a variable length encoding of the timestamp might be used (much as in CRTP). Philippe also asked if the MTAP timestamp offset has to match the rate of the RTP clock (this is the reason for the 2/3rd of a second offset limit)?Stephan objected to this idea, because it causes a loss of precision in gateways and adds considerable complexity. Steve Casner also noted the issue of precision in the low bits of the timestamp as being important. Philippe noted that MTAPs can be used for interleaving, and wondered if the offset size limitation was problematic for this use? Stephan believes not, but this may depend on the application (e.g. Philippe noted that streaming applications may have very long interleaving periods). Magnus Westerlund voiced support for variable length encoding of the timestamp, to solve this problem.

    Regarding the marker bit, Stephan noted that there is no need for an end of picture signal in JVT. Accordingly, it would be helpful to signal end of slice or end of NALU (if fragmented) instead. Is this an acceptable use of the marker bit? Steve Casner agreed that signalling the end of an application data unit, even if that is not end-of-picture, is appropriate.

    The final issue was whether to allow media unaware fragmentation, signalled by the marker bit, in the payload format. It is clearly better to fragment on application meaningful boundaries, but there was no real objection to adding media unaware fragmentation, so long as it can be done in a clean way.

    Steve Casner called a "hum" on making this an AVT work item, after asking on the status of the draft within JVT. The room expressed support for taking this as a work item.

    MPEG-4 payload format and related MIME types

    Philippe Gentric described progress in the RTP payload format for MPEG-4 (draft-ietf-avt-mpeg4-simple-04.txt) and a related draft containing MIME registrations (draft-lim-mpeg4-mime-00.txt). The payload format is in working group last call and has also been reviewed by MPEG. Since the previous meeting, the draft has been extended to transport MPEG-4 System streams (still no SL) and has two new optional fields in the AU header section to transport a random access point flag and a stream-state counter. The remaining open issue is the naming of the "profile" MIME parameter, which is misleading. There is ongoing discussion to change this to either "MaxInterleaveDelay", "maxInterleave" or "maxptime" for clarity and compatibility with other formats (however, the ISMA uses the existing name, so there may be compatibility issues with existing products if a change is made).

    Philippe also described draft-lim-mpeg4-mime-00.txt, which is an evolution of draft-singer-mpeg4-ip-04.txt. The informative parts of the Singer draft will be published as part of MPEG-4, with the MIME types being extracted into this new draft for publication by the IETF. Comments are solicited.

    MPEG-4 FlexMux

    Catherine Roux presented the RTP payload for MPEG-4 FlexMultiplexed streams (draft-curet-avt-rtp-mpeg4-flexmux-03.txt). A number of open issues exist, regarding the relation between the clock references and RTP timestamp, the ability to synchronise FlexMux streams with non-FlexMux content using RTP, the ability to robustly signal FlexMux configuration, the SDP parameters and the applicability of the format.

    The draft now considers the RTP timestamp to be the send time of the packet. However, it is still not clear how to synchronise MPEG-4 FlexMux content with non-MPEG content transported in RTP, due to the lack of an appropriate reference clock. Steve Casner recognised that there may not be a clean solution to this problem, and that the applicability statement for this payload format may have to document that synchronisation with normal RTP content is not possible.

    To improve robustness of FlexMux configuration, the proposal is to send repeated copies of the signalling, in advance of the change, to provide probabilistic reliability. This seems reasonable, provided the limited guarantee is noted.

    There is also the issue of error sensitive streams, such as systems streams, which can be transported in FlexMux. One solution is to carousel the data, but TCP may also be used. There are significant synchronisation issues with the use of TCP as part of a presentation, which are not yet addressed.

    Use of a=fmtp to signal FlexMux parameters was briefly explained. Nothing has changed since the previous meeting, except that the type will be registered "audio", "video" or "application" to match the MPEG-4 payload format.

    There are still significant open issues with this format, which have to be addressed before it can advance.


    The first session concluded with demonstrations of the JPEG-2000 and AC-3 payload formats.

    [At this point, please adjust your locale dial from en_GB to en_US.]

    Intra-Frame Request Signaling

    The second AVT session began with a discussion deferred from the MMUSIC working group session earlier in the day. In a multi-party conferencing system with switched video, a receiver that begins receiving a new source needs to signal to the sender that a full intra-coded frame is required to begin decoding. The question is whether this signal should be passed in SDP using the offer/answer method, or in RTCP.

    We reached a common understanding on two sub-issues:1) no matter how the signaling is done, the spec cannot say the sender MUST send an intra-frame because this is dependent upon congestion conditions, but the sender MUST be prepared to receive the signal and respond with the intra-frame if it is able; and 2) the request for a full intra-frame is distinct from the loss-of-reference-picture indication that is already specified in the RTCP Feedback Profile because the sender's response may be different, and therefore two different signals are required (although both may use the same signaling channel).

    Roni Even stated a preference to use SDP for the new indication so it can go together with the "freeze" command that would be sent that way. But either way, where would the full process be described? His contribution to MMUSIC, using SDP, gave such a description. Joerg Ott believes the signal belongs in RTCP, and suggested that all we need is a 3-page Internet-Draft to specify the semantics of an additional RTCP request to be used under the RTCP Feedback Profile.

    Jonathan Rosenberg understood the consensus from MMUSIC to be that this signal was not appropriate as an SDP parameter because it is not a property of the media stream. One of the fundamental properties of the offer/answer model is that the attributes of the session have no dependence on history. To use offer/answer you would have to "turn on" the intra transmission and then "turn off" with another REINVITE.

    Dave Oran identified a conflict between this inherently unidirectional signal and the bidirectional protocols in which SDP is usually embedded. If the protocol is running stop-and-wait, the timing of the requests and responses can become completely out-of-sync.

    Roni concluded that we need to go back to MMUSIC to discuss it again because we still don't agree if this operation is a changing of the stream or not. Another participant commented that we have a conflict between what the IETF wants to do and what the implementers want to do. The implementers will go their own way.

    Steve Casner noted that the use of RTCP for this signal was proposed at the last IETF, but we stumbled then because of disagreement on the "MUST" issue. Otherwise, we might have had a solution then. He ended the discussion and summarized the output from AVT to MMUSIC as follows: having both of these signals carried in RTCP is a fine idea that fits in the RTCP feedback scheme if MMUSIC concludes that SDP is not the appropriate method.

    iLBC Speech

    Alan Duric presented an update of two drafts on the iLBC speech codec and its associated payload format in draft-andersen-ilbc-01.txt and draft-duric-rtp-ilbc-01.txt, respectively. The changes in the codec since the -00 version were to rearrange the bit packing for Unequal Level Protection and reduce the total number of bits from 419 to 416 so the result is 8+12+32=52 bytes for the three decreasing priority levels. There have also been some revisions to the code and descriptions in the draft based on feedback from implementers. No technical changes in the payload format were mentioned.

    Alan gave a brief description of the coding steps as requested by some participants at the previous AVT meeting -- see the presentation.

    Planned future work is to develop a 20 ms frame option (vs. 30ms) and to add voice activity detection and comfort noise generation. They will also be optimizing some parts of the algorithm to reduce complexity. Alan expected to have some testing results from one University to present, but will send this to the mailing list later. The summary is that it is working quite nicely due to the simple payload structure. The executable for a demo SIP client with the iLBC codec is available by request from

    Uncompressed Video

    Ladan Gharai discussed two payload formats for uncompressed video. The first is draft-ietf-avt-smpte292-video-06.txt which has been in process for some time; it is for constant-rate video, essentially circuit emulation with all bits from a SMPTE 292M stream being transported. It is designed to interoperate with existing broadcast equipment. The second, draft-gharai-avt-uncomp-video-00.txt, is a new payload format for a more native (to RTP) packetization that is flexible over a wide range of uncompressed video formats and sends only the active video (no blanking). The choice between the two formats depends upon the application.

    The main change in the smpte292 draft was the definition of a new term "pgroup" which is the smallest number of pixels that keeps together related Y, Cb and Cr values and fills an integral number of octets. The purpose of defining the pgroup is to specify where fragmentation should occur (between pgroups). The payload header has not changed since the last draft, and no further major technical changes are expected. The authors plan to submit another draft revision by August 15 with additional technical rationale, and then would like to go to working group last call.

    The new uncompressed video draft should cover most any uncompressed video format including BT.601, SMPTE 296M and 274M, and future digital cinema formats with 4K x 4K frame size. There is already a Proposed Standard payload format for uncompressed video in RFC 2431, however it is limited to 4096 scan lines per frame and 2048 pixels per line, and is constrained to 4:2:2 color subsampling of YUV data. This new draft supports up to 64K scan lines and pixels per line and supports RGB as well as YUV data in various color subsamplings. The new draft also provides flexible support for multiple scan lines per packet rather than just one (or a fragment), which may be important for lower data rates or jumbo packets. For each line, there is a 64-bit payload header section to carry the scan line number, scan offset for fragmentation, and length. In contrast, RFC 2431 uses only a 32-bit payload header, although for high-rate video this is not an issue. RFC 2431 indicates the sample size and data type in-band. That information is moved to out-of-band signaling in the new draft, but the details of the SDP parameters remain to be specified.

    Steve Casner is not aware of any implementations of RFC 2431 and asked if there are likely to be implementations of this new format. Acceptance of it as a work item is dependent upon whether or not it is likely to be used. Ladan responded that her group has an implementation, and other people are working on it as well.

    Philippe Gentric confirms that the new proposal is more useful, especially for the 4:2:0 YUV native format of JPEG and MPEG, and perhaps even for lower resolution (CIF or QCIF) images. He suggests adding the capability to specify the pixel aspect ratio.

    Ladan replied that it is not clear how 4:2:0 video should be packetized, since the chrominance info is related to two scan lines of luminance info. She would like feedback on that on the mailing list.

    Those present gave a positive hum for taking on the new draft as an AVT work item.

    RTCP Reporting Extensions

    During IETF week itself, Timur Friedman and Alan Clark collaborated to produce a combination of draft-clark-avt-rtcpvoip-01.txt with draft-friedman-avt-rtcp-report-extns-02.txt; the result will be sent to mailing list shortly. The new draft integrates the VoIP metrics of the Clark draft with the additional RTCP report block types specified by the Friedman draft to allow reporting of packet duplication and loss patterns using run-length encodings, to add timestamps for multicast inference of network characteristics (loss rates and delays along logical links within an RTP session) and a statistics summary for more detailed info than in the RTCP SR and RR packets, and to define a mechanism to allow receivers to measure RTT in the same way that senders can.

    Changes in the VoIP metrics relative to the -01 revision of the Clark draft include adding a Gmin parameter to allow the burst density threshold to be adjusted, and changing packet loss rate to be a binary fraction, as in existing RTCP reports. The estimated MOS quality score has been broken into two, a "listening" quality that does not consider the effects of delay, and a "conversational" quality that does.

    One motivation for adding he VoIP metrics is to allow VoIP service providers to get feedback on the quality experienced by the end user inside an enterprise behind a firewall. Comparing the VoIP metrics with SLA monitoring on the service provider's side of the firewall allows the service provider to determine whether problems are in the WAN or the enterprise network.

    Steve Casner expressed concern that the "implementation specific" fields of the VoIP metrics block are totally unspecified. The draft either needs to say how these fields will be specified, or define them to be always zero until some future specification revises this one. It is not reasonable to say the bits are open for arbitrary use.

    Al Morton said that the burst parameters may not be aligned with those of the E-Model produced by ITU Study Group 12 in May. However, Henry Sinnreich believes the E-Model is inappropriate for the Internet. Alan Clark responded that he is familiar with the E-Model, but has deliberately kept these metrics independent of what model is used because some people will want to use the E-Model and some will use other models.

    Magnus Westerlund asked what status would be assigned to this document (Proposed Standard or Experimental). Timur responded that the group decided in Minneapolis to go for Experimental. Magnus suggested we rethink that, and go for Proposed. He would really like to have the RTT measurements, for example to use with retransmission.

    Steve Casner explained that the reason we decided on Experimental was that it was unclear how much these measures added above what is already in RTCP. The authors have been doing measurements to quantify that, and there is more evidence now that implementers are ready to use at least some of these functions in practice. That would be more effective at Proposed rather than Experimental.

    RTP Retransmission

    Perhaps the most significant progress at this meeting resulted from side discussions among the authors of the two alternative proposals contained in drafts draft-ietf-avt-rtp-retransmission-01.txt and draft-ietf-avt-selret-05.txt for RTP retransmission based on RTCP feedback. Jose Rey provided a consensus report from these discussions.

    Merging of the two approaches was enabled by the recognition and acceptance of a requirement for the solution to be able to indicate explicitly which RTP packets were lost. The technique from the "selret" draft for multiplexing the initial transmissions and retransmissions in one stream by sharing the sequence number space does not allow this. Therefore, that proposal will be abandoned in favor of carrying the initial transmissions and retransmissions as separate streams, but to reduce the number of UDP ports required, the streams will be multiplexed on the SSRC id so long as no problems with that approach are found.

    Steve Casner observed that it may not be necessary to restrict the solution to just SSRC multiplexing or just port multiplexing. For example, the FEC payload format in RFC 2733 allows either. For some applications it may be more important to restrict the number of ports used (favoring SSRC multiplexing), while for others it may be more important to allow selectability in receiving both streams or just the initial transmissions (favoring port multiplexing).

    Jose expressed concern that applications would not know whether peers had implemented one or both methods. Rahul Agarwal was also concerned that if there are two solutions, that requires either the server or the client to implement both for maximum interoperability. Steve responded that it could be part of the session signaling or might be a fixed characteristic for a particular application. He explained that the selection of which method is used might be fixed in a given RTP profile or by an implementation agreement for a particular application, so there would be no interoperability issue within that application. But the payload format could remain flexible to fit different requirements for different applications. Magnus Westerlund considered the implementation difference to be small, so you could cheaply implement both. Colin Perkins said we need to evaluate the value of having both methods available. If only one approach is needed, that would be preferable, but there is no objection to having both if they are needed.

    A separate issue is the SEL payload format in the "selret" draft for communicating packet priority. This allows only reporting losses of "important" packets to reduce the feedback bandwidth. However, some people contend that the difference is not significant. To facilitate a decision about including the SEL format, the performance will be evaluated quantitatively compared to reporting of all losses.

    A work plan was outlined, calling for the mentioned evaluations to be completed no later than September so a merged draft could be posted in October. The goal is a WG last call in December.

    SSRC Multiplexing

    The question of whether it is acceptable to use SSRC multiplexing in RTP retransmission is a specific case of a more general question: should the identifier for an "RTP session" be redefined to include the SSRC identifier in addition to the destination transport address?In other words, why disallow multiplexing of RTP sessions based on SSRC identifier?This was a bonus topic for the previous AVT meeting in Minneapolis that was not presented due to lack of time, so Steve Casner revived the presentation in this meeting.

    The primary reason why we can't change the definition of an RTP session is that there are scenarios where multiple sources are intended to be combined in one session, such as multiple audio streams being summed in a multiparty conference or multiple video cameras on one workstation being transmitted in the same session. That's why the SSRC identifier was added to the RTP header in the first place. It allows incoming streams to be distinguished independently of the source transport address since the stream might flow through an RTP translator such that the original source transport address is lost.

    In addition, Section 5.2 of the RTP specification lists several reasons why both the SSRC id and the payload type field should not be used for multiplexing RTP sessions, in particular sessions of different media. However, for some applications, the implementers feel that these considerations do not apply. Those implementers are more concerned about the requirement to use a large number of UDP ports to multiplex the RTP sessions because the performance of some operating systems degrades severely in that situation (due to an inefficient search to match ports to sockets on incoming packets). Rather than changing the definition of an RTP session, perhaps the energy should be spent getting operating system inefficiencies fixed instead?

    One problem with using the SSRC for multiplexing when streams originate on multiple hosts is that the assignment of SSRC identifiers must be coordinated among the sources. Roni Even pointed out that multiplexing on the SSRC id introduces another level of demultiplexing which precludes the receiver from dispatching the sources to different processes, and that one stream can impact the latency of another (independent) stream.

    Rahul Agarwal agreed that it would be nice if we could fix the OS, but it is a long and difficult process to convince the vendors to do so. There is also a problem that some operating systems limit the number of ports per process, thus requiring multiple processes for a high-scale server.

    Steve said what this question boils down to is whether we need to add extra words in Section 5.2 to relax the guideline against SSRC id multiplexing or to say under what conditions it is acceptable to violate the guideline. His personal preference is not to make any changes, but instead to allow other documents, such as the RTP retransmission specification, to explain why a choice of SSRC multiplexing was used and why it is not a problem. He asked if anyone feels strongly that the text should be changed.

    Rahul replied that the considerations in the existing text are all related to multiplexing multiple different media streams, so those clauses don't apply to the case of RTP retransmission. He would like to see a clause added to say that for a single medium SSRC multiplexing is OK.

    Steve noted that some text has been added in the revised profile to explain that the prohibition against multiplexing on SSRC id or payload type is in particular for trying to put different media together. Multiple sources of one medium in one session is allowed and expected when they are to be combined and processed together, and that switching payload types on the fly to change encodings is also perfectly normal; that's the reason the payload type field is in the RTP header rather than being signaled out of band.

    RTCP Extensions for Single-Source Multicast Sessions

    Julian Chesterfield presented an update on the draft specifying unicast feedback for group sessions, draft-ietf-avt-rtcpssm-01.txt, to facilitate use of RTP with single-source multicast. In previous discussions of this draft, we've established the need to address the security issues it introduces. A good analysis of the security threats and an evaluation of the existing solutions has been written as a separate document to work toward the finished solution (see

    The current focus is to identify a level of security that should be mandated by the draft. The goal is to provide the same level of guarantee as the current RTCP for any-source multicast. That is, although additional security services or protections might be desired, it is not a requirement for the rtcpssm solution to provide stronger protection than does the current multicast RTCP. On the other hand, replay defense is an example of an additional service that may be an inherent side benefit of any security mechanism that meets the basic requirements.

    Steve Casner agreed that a higher level of protection is not a requirement for the basic level of operation with SSM. However, additional services such as SRTP can be added to RTP in any-source multicast operation, and it is a requirement that these additional services be usable with SSM as well. The security issues we want to address with rtcpssm are the new risks such as denial of service attacks that are introduced by unicast RTCP, not confidentiality and admission control.

    Julian went on to say that the fundamental defense is authentication of the feedback address (the destination for the unicast RTCP) and authentication of the RTCP information from the multicast source which controls the bandwidth calculations. Given that authentication of the RTCP packets from the multicast source is required, then one solution for authenticating the feedback address is to send it in-band with the multicast RTCP. This also allows changing the feedback address during the session if needed. Another option is to use out-of-band signaling, e.g., in SDP, with an authenticated transport mechanism. Julian is seeking feedback from the group on this choice. He also asks to what extent should the specification give recommendations of specific approaches for security functions versus just establishing requirements?

    Colin Perkins replied that we should require approaches that make it as secure as "normal RTCP", and then we can recommend additions that make it more secure. For example, it is appropriate to say the feedback identifier MUST be authenticated, but it is not clear whether there is one single authentication solution that is always appropriate and therefore must be implemented, or whether there are several solutions and you should implement one of them or something else with equivalent security. We may need to use different alternatives for signaling done in different ways, so we could not mandate just one.

    Another participant asked if the purpose of "MUST" in a specification is to achieve interoperability, how can the choice of approach be left open?If two implementations make different choices, they can't interoperate.

    Colin replied that it would be good to include in the rtcpssm specification how the security would be done for a couple of common signaling protocols (e.g., RTSP and SIP), as well as how it would be done for the in-band RTCP (since there are two parts to the problem). Then, if you are using a different signaling protocol you MUST achieve the security requirements, but how you do it is to be specified separately.

    Philippe Gentric suggested adding a criterion that the solution should be suitable for operation through a NAT in which the apparent address for feedback might be changed.


    This AVT session was unusual in that we reached the end of the agenda before the end of the session (which has not happened for years). Steve Casner mentioned a couple of topics regarding the revision of the RTP specification that had not been put on the agenda. The code currently in the appendix indicates a packet loss value of 1 when no packets have yet been received. Steve had planned to work out a solution and present it here, but didn't get that done. We can include such a (small) change as a comment to the RFC-editor even though the draft has been reviewed by the IESG. He asked if anyone had fixed that bug in their implementations, but nobody said yes. Steve is also considering adding a definition of the term "sampling instant" in the revised RTP specification to explain what it means in scenarios other than live sampling of the media. Contributions for either of these additions would be welcome.

    Action Items

    We took two "hums" during the meeting which need confirmation on the mailing list. They were to accept the payload formats for uncompressed video (draft-gharai-avt-uncomp-video-00.txt) and for JVT video (draft-wenger-avt-rtp-jvt-01.txt) as working group tasks. We solicit confirmations or objections on these actions -- we want to hear both "yeas" and "nays".


    MIDI Wire Protocol Packetization
    Payload for Interleaved Audio
    Payload Format for the AC-3 Audio Coder
    Payload Format for JPEG 2000 video streams
    Payload for JVT Video
    ISO/IEC 14496 'MPEG-4' transport
    Payload format for MPEG-4 FlexMux
    Payload format for iLBC speech
    Payload format for SMPTE 292M & uncompressed video
    RTCP reporting extensions & report block for VoIP
    RTP retransmission
    Multiplexing RTP basid on SSRC ID
    RTCP extensions for Single Source Multicast sessions