| < draft-civanlar-bmpeg-01.txt | draft-civanlar-bmpeg-02.txt > | |||
|---|---|---|---|---|
| Internet Engineering Task Force M. Reha Civanlar | Internet Engineering Task Force M. Reha Civanlar | |||
| INTERNET-DRAFT Glenn L. Cash | INTERNET-DRAFT Glenn L. Cash | |||
| File: draft-civanlar-bmpeg-01.txt Barry G. Haskell | File: draft-civanlar-bmpeg-02.txt Barry G. Haskell | |||
| Expire in six months | ||||
| AT&T Labs-Research | AT&T Labs-Research | |||
| February, 1997 | November, 1997 | |||
| RTP Payload Format for Bundled MPEG | RTP Payload Format for Bundled MPEG | |||
| Status of this Memo | Status of this Memo | |||
| This document is an Internet-Draft. Internet-Drafts are working | This document is an Internet-Draft. Internet-Drafts are working | |||
| documents of the Internet Engineering Task Force (IETF), its areas, | documents of the Internet Engineering Task Force (IETF), its areas, | |||
| and its working groups. Note that other groups may also distribute | and its working groups. Note that other groups may also distribute | |||
| working documents as Internet-Drafts. | working documents as Internet-Drafts. | |||
| skipping to change at page 1, line 36 ¶ | skipping to change at page 1, line 36 ¶ | |||
| ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow | ``1id-abstracts.txt'' listing contained in the Internet- Drafts Shadow | |||
| Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), | Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), | |||
| munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or | munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or | |||
| ftp.isi.edu (US West Coast). | ftp.isi.edu (US West Coast). | |||
| Distribution of this memo is unlimited. | Distribution of this memo is unlimited. | |||
| Abstract | Abstract | |||
| This document describes a payload type for bundled, MPEG-2 encoded | This document describes a payload type for bundled, MPEG-2 encoded | |||
| video and audio data to be used with RTP, version 2. Bundling has some | video and audio data that may be used with RTP, version 2. Bundling | |||
| advantages for this payload type particularly when it is used for | has some advantages for this payload type particularly when it is used | |||
| video-on-demand applications. This payload type is to be used when its | for video-on-demand applications. This payload type may be used when | |||
| advantages are important enough to sacrifice the modularity of having | its advantages are important enough to sacrifice the modularity of | |||
| separate audio and video streams. | having separate audio and video streams. | |||
| A technique to improve packet loss resilience based on "out-of-band" | A technique to improve packet loss resilience based on ''out-of-band'' | |||
| transmission of MPEG-2 specific, vital information is described also. | transmission of MPEG-2 specific, vital information is described as an | |||
| Appendix. | ||||
| A section on security considerations for this payload type is added. | ||||
| 1. Introduction | 1. Introduction | |||
| This document describes a bundled packetization scheme for MPEG-2 | This document describes a bundled packetization scheme for MPEG-2 | |||
| encoded audio and video streams using the Real-time Transport Protocol | encoded audio and video streams using the Real-time Transport Protocol | |||
| (RTP), version 2 [1]. | (RTP), version 2 [1]. | |||
| The MPEG-2 International standard consists of three layers: audio, | The MPEG-2 International standard consists of three layers: audio, | |||
| video and systems [2]. The audio and the video layers define the | video and systems [2]. The audio and the video layers define the | |||
| syntax and semantics of the corresponding "elementary streams." The | syntax and semantics of the corresponding "elementary streams." The | |||
| systems layer supports synchronization and interleaving of multiple | systems layer supports synchronization and interleaving of multiple | |||
| compressed streams, buffer initialization and management, and time | compressed streams, buffer initialization and management, and time | |||
| identification. RFC 2038 [3] describes packetization techniques to | identification. RFC 2038 [3] describes packetization techniques to | |||
| skipping to change at page 2, line 30 ¶ | skipping to change at page 2, line 33 ¶ | |||
| together. Its advantages over independent packetization of audio and | together. Its advantages over independent packetization of audio and | |||
| video are: | video are: | |||
| 1. Uses a single port per "program" (i.e. bundled A/V). | 1. Uses a single port per "program" (i.e. bundled A/V). | |||
| This may increase the number of streams that can be served | This may increase the number of streams that can be served | |||
| e.g., from a VOD server. Also, it eliminates the performance | e.g., from a VOD server. Also, it eliminates the performance | |||
| hit when two ports are used for the separate audio and video | hit when two ports are used for the separate audio and video | |||
| messages on the client side. | messages on the client side. | |||
| 2. Provides implicit synchronization of audio and video. | 2. Provides implicit synchronization of audio and video. | |||
| The server need not do anything else (e.g. generate | This is particularly convenient when the A/V data is stored | |||
| RTCP packets) for this purpose. This is particularly | in an interleaved format at the server. | |||
| convenient when the A/V data is stored in an interleaved | ||||
| format at the server and no stream other than the bundled | ||||
| A/V is to be transmitted during the session. | ||||
| 3. Reduces the header overhead. Since using large packets | 3. Reduces the header overhead. Since using large packets | |||
| increases the effects of losses and delay, audio only | increases the effects of losses and delay, audio only | |||
| packets need to be smaller increasing the overhead. An | packets need to be smaller increasing the overhead. An | |||
| A/V bundled format can provide about 1% overall overhead | A/V bundled format can provide about 1% overall overhead | |||
| reduction. Considering the high bitrates used for MPEG-2 | reduction. Considering the high bitrates used for MPEG-2 | |||
| encoded material, e.g. 4 Mbps, the number of bits saved, | encoded material, e.g. 4 Mbps, the number of bits saved, | |||
| e.g. 40 Kbps, may provide noticeable audio or video | e.g. 40 Kbps, may provide noticeable audio or video | |||
| quality improvement. | quality improvement. | |||
| skipping to change at page 3, line 11 ¶ | skipping to change at page 3, line 10 ¶ | |||
| let's assume that using two buffers, each with a size B, | let's assume that using two buffers, each with a size B, | |||
| is sufficient with probability P when each stream is | is sufficient with probability P when each stream is | |||
| transmitted individually. The probability that the same | transmitted individually. The probability that the same | |||
| buffer size will be sufficient when both streams need to | buffer size will be sufficient when both streams need to | |||
| be received is P times the conditional probability of B | be received is P times the conditional probability of B | |||
| being sufficient for the second stream given that it was | being sufficient for the second stream given that it was | |||
| sufficient for the first one. This conditional probability | sufficient for the first one. This conditional probability | |||
| is, generally, less than one requiring use of a larger | is, generally, less than one requiring use of a larger | |||
| buffer size to achieve the same probability level. | buffer size to achieve the same probability level. | |||
| 5. May help with the control of the overall bandwidth used | ||||
| by an A/V program. | ||||
| And, the advantages over packetization of the transport layer streams | And, the advantages over packetization of the transport layer streams | |||
| are: | are: | |||
| 1. Reduced overhead. It does not contain systems layer | 1. Reduced overhead. It does not contain systems layer | |||
| information which is redundant for the RTP (essentially | information which is redundant for the RTP (essentially | |||
| they address similar issues). | they address similar issues). | |||
| 2. Easier error recovery. Because of the structured | 2. Easier error recovery. Because of the structured | |||
| packetization consistent with the ALF principle, loss | packetization consistent with the application layer | |||
| concealment and error recovery can be made simpler and | framing (ALF) principle, loss concealment and error | |||
| more effective. | recovery can be made simpler and more effective. | |||
| 2. Encapsulation of Bundled MPEG Video and Audio | 2. Encapsulation of Bundled MPEG Video and Audio | |||
| Video encapsulation follows the rules described in [3] with the addition | Video encapsulation follows rules similar to the ones described in [3] | |||
| of the following: | for encapsulation of MPEG elementary streams. Specifically, | |||
| each packet must contain an integral number of video slices | 1. The MPEG Video_Sequence_Header, when present, will always | |||
| be at the beginning of an RTP payload. | ||||
| 2. An MPEG GOP_header, when present, will always be at the | ||||
| beginning of the RTP payload, or will follow a | ||||
| Video_Sequence_Header. | ||||
| 3. An MPEG Picture_Header, when present, will always be at the | ||||
| beginning of a RTP payload, or will follow a GOP_header. | ||||
| In addition to these, it is required that: | ||||
| 4. Each packet must contain an integral number of video slices. | ||||
| It is the application's responsibility to adjust the slice sizes and the | ||||
| number of slices put in each RTP packet so that lower level | ||||
| fragmentation does not occur. This approach simplifies the receivers | ||||
| while somewhat increasing the complexity of the transmitter's | ||||
| packetizer. Considering that a slice can be as small as a single | ||||
| macroblock, it is possible to prevent fragmentation for most of the | ||||
| cases. If a packet size exceeds the path maximum transmission unit | ||||
| (path-MTU) [4], this payload type depends on the lower protocol layers | ||||
| for fragmentation and this may cause problems with packet classification | ||||
| for integrated services (e.g. with RSVP). | ||||
| The video data is followed by a sufficient number of integral audio | The video data is followed by a sufficient number of integral audio | |||
| frames to cover the duration of the video segment included in a packet. | frames to cover the duration of the video segment included in a packet. | |||
| For example, if the first packet contains three 1/900 seconds long | For example, if the first packet contains three 1/900 seconds long | |||
| slices of video, and Layer I audio coding is used at a 44.1kHz sampling | slices of video, and Layer I audio coding is used at a 44.1kHz sampling | |||
| rate, only one audio frame covering 384/44100 seconds of audio need be | rate, only one audio frame covering 384/44100 seconds of audio need be | |||
| included in this packet. Since the length of this audio frame (8.71 | included in this packet. Since the length of this audio frame (8.71 | |||
| msec.) is longer than that of the video segment contained in this packet | msec.) is longer than that of the video segment contained in this packet | |||
| (3.33 msec), the next few packets may not contain any audio frames until | (3.33 msec), the next few packets may not contain any audio frames until | |||
| the packet in which the covered video time extends outside the length of | the packet in which the covered video time extends outside the length of | |||
| the previously transmitted audio frames. Alternatively, it is possible, | the previously transmitted audio frames. Alternatively, it is possible, | |||
| in this proposal, to repeat the latest audio frame in "no-audio" packets | in this proposal, to repeat the latest audio frame in "no-audio" packets | |||
| for packet loss resilience. | for packet loss resilience. Again, it is the application's | |||
| responsibility to adjust the bundled packet size according to the | ||||
| minimum MTU size to prevent fragmentation. | ||||
| 2.1. RTP Fixed Header for BMPEG Encapsulation | 2.1. RTP Fixed Header for BMPEG Encapsulation | |||
| The following RTP header fields are used: | The following RTP header fields are used: | |||
| Payload Type: A distinct payload type number should be assigned to | Payload Type: A distinct payload type number, which may be dynamic, | |||
| BMPEG. | should be assigned to BMPEG. | |||
| M Bit: Set for packets containing end of a picture. | M Bit: Set for packets containing end of a picture. | |||
| timestamp: 32-bit 90 kHz timestamp representing transmission time of | timestamp: 32-bit 90 kHz timestamp representing sampling time of the | |||
| the MPEG picture and is monotonically increasing. Same for all packets | MPEG picture. May not be monotonically increasing if B pictures are | |||
| belonging to the same picture. For packets that contain only a | present. Same for all packets belonging to the same picture. For | |||
| sequence, extension and/or GOP header, the timestamp is that of the | packets that contain only a sequence, extension and/or GOP header, the | |||
| subsequent picture. | timestamp is that of the subsequent picture. | |||
| 2.2. BMPEG Specific Header: | 2.2. BMPEG Specific Header: | |||
| 0 1 2 3 | 0 1 2 3 | |||
| 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| |MBZ|R|N| P | Audio Length | Audio Offset | | | P |N|MBZ| Audio Length | | Audio Offset | | |||
| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | |||
| MBZ | ||||
| MBZ: Reserved for future use (2 bits). They must be set to zero now. | P: Picture type (2 bits). I (0), P (1), B (2). | |||
| R: Redundant audio (1 bit). Set if the audio frame contained in the | ||||
| packet is a repetition of the last audio frame. | ||||
| N: Header data changed (1 bit). Set if any part of the video sequence, | N: Header data changed (1 bit). Set if any part of the video sequence, | |||
| extension, GOP and picture header data is different than that of the | extension, GOP and picture header data is different than that of the | |||
| previously sent headers. It gets reset, when all the header data gets | previously sent headers. It gets reset when all the header data gets | |||
| repeated. | repeated (see Appendix 2). | |||
| P: Picture type (2 bits). I (0), P (1), B (2). | MBZ: Must be zero. Reserved for future use. | |||
| Audio Length: (10 bits) Length of the audio data in this packet in | Audio Length: (10 bits) Length of the audio data in this packet in | |||
| bytes. | bytes. Start of the audio data is found by subtracting "Audio Length" | |||
| from the total length of the received packet. | ||||
| Audio Offset: (16 bits) The offset between the audio frame and the | Audio Offset: (16 bits) The offset between the start of the audio | |||
| start of the video segment in this packet in number of audio samples. | frame and the RTP timestamp for this packet in number of audio samples | |||
| (for multi-channel sources, a set of samples covering all channels is | ||||
| counted as one sample for this purpose.) | ||||
| 3. Out-of-band Transmission of the "High Priority" Information | Audio offset is a signed integer in two's complement form. It allows a | |||
| ~ +/- 750 msec offset at 44.1 KHz audio sampling rate. For a very low | ||||
| video frame rate (e.g., a frame per second), this offset may not be | ||||
| sufficient and this payload format may not be usable. | ||||
| If B frames are present, audio frames are not re-ordered along with | ||||
| video. Instead, they are packetized along with video frames in their | ||||
| transmission order (e.g., an audio segment packetized with a video | ||||
| segment corresponding to a P picture may belong to a B picture, which | ||||
| will be transmitted later and should be rendered at the same time with | ||||
| this audio segment.) Even though the video segments are reordered, the | ||||
| audio offset for a particular audio segment is still relative to the | ||||
| RTP timestamp in the packet containing that audio segment. | ||||
| Since a special picture counter, such as the "temporal reference | ||||
| (TR)" field of [3], is not included in this payload format, lost GOP | ||||
| headers may not be detected. The only effect of this may be incorrect | ||||
| decoding of the B pictures immediately following the lost GOP header | ||||
| for some edited video material. | ||||
| 3. Security Considerations | ||||
| RTP packets using the payload format defined in this specification are | ||||
| subject to the security considerations discussed in the RTP | ||||
| specification [1]. This implies that confidentiality of the media | ||||
| streams is achieved by encryption. Because the data compression used | ||||
| with this payload format is applied end-to-end, encryption may be | ||||
| performed after compression so there is no conflict between the two | ||||
| operations. | ||||
| This payload type does not exhibit any significant non-uniformity in the | ||||
| receiver side computational complexity for packet processing to cause a | ||||
| potential denial-of-service threat. | ||||
| A security review of this payload format found no additional | ||||
| considerations beyond those in the RTP specification. | ||||
| Appendix 1. Out-of-band Transmission of the "High Priority" Information | ||||
| In MPEG encoded video, loss of the header information, which includes | In MPEG encoded video, loss of the header information, which includes | |||
| sequence, GOP, and picture headers, and the corresponding extensions, | sequence, GOP, and picture headers, and the corresponding extensions, | |||
| causes severe degradations in the decoded video. When possible, | causes severe degradations in the decoded video. When possible, | |||
| dependable transmission of the header information to the receivers can | dependable transmission of the header information to the receivers can | |||
| improve the loss resiliency of MPEG video significantly [4]. RFC 2038 | improve the loss resiliency of MPEG video significantly [5]. RFC 2038 | |||
| describes a payload type where the header information can be repeated in | describes a payload type where the header information can be repeated in | |||
| each RTP packet. Although this is a straightforward approach, it may | each RTP packet. Although this is a straightforward approach, it may | |||
| increase the overhead. | increase the overhead. | |||
| The "data partitioning" method in MPEG-2 defines the syntax and | The "data partitioning" method in MPEG-2 defines the syntax and | |||
| semantics for partitioning an MPEG-2 encoded video bitstream into "high | semantics for partitioning an MPEG-2 encoded video bitstream into "high | |||
| priority" and "low priority" parts. If the "high priority" (HP) part is | priority" and "low priority" parts. If the "high priority" (HP) part is | |||
| selected to contain only the header information, it is less than two | selected to contain only the header information, it is less than two | |||
| percent of the video data and can be transmitted before the start of the | percent of the video data and can be transmitted before the start of the | |||
| real-time transmission using a reliable protocol. In order to | real-time transmission using a reliable protocol. In order to | |||
| synchronize the HP data with the corresponding real-time stream, the | synchronize the HP data with the corresponding real-time stream, the | |||
| initial value of the timestamp for the real-time stream may be inserted | initial value of the timestamp for the real-time stream may be inserted | |||
| at the beginning of the HP data. | at the beginning of the HP data. | |||
| Alternatively, the HP data may be transmitted along with the A/V data | Alternatively, the HP data may be transmitted along with the A/V data | |||
| using layered multimedia transmission techniques for RTP [5]. | using layered multimedia transmission techniques for RTP [6]. | |||
| Appendix 1. Error Recovery | Appendix 2. Error Recovery | |||
| Packet losses can be detected from a combination of the sequence number | Packet losses can be detected from a combination of the sequence number | |||
| and the timestamp fields of the RTP fixed header. The extent of the loss | and the timestamp fields of the RTP fixed header. The extent of the loss | |||
| can be determined from the timestamp, the slice number and the | can be determined from the timestamp, the slice number and the | |||
| horizontal location of the first slice in the packet. The slice number | horizontal location of the first slice in the packet. The slice number | |||
| and the horizontal location can be determined from the slice header and | and the horizontal location can be determined from the slice header and | |||
| the first macroblock address increment, which are located at fixed bit | the first macroblock address increment, which are located at fixed bit | |||
| positions. | positions. | |||
| If lost data consists of slices all from the same picture, new data | If lost data consists of slices all from the same picture, new data | |||
| following the loss can simply be given to the video decoder which will | following the loss may simply be given to the video decoder which will | |||
| normally repeat missing pixels from a previous picture. The next audio | normally repeat missing pixels from a previous picture. The next audio | |||
| frame must be delayed by the duration of the lost video segment. | frame must be played at the appropriate time determined by the timestamp | |||
| and the audio offset contained in the received packet. Appropriate audio | ||||
| frames (e.g., representing background noise) may need to be fed to the | ||||
| audio decoder in place of the lost audio frames to keep the lip-synch | ||||
| and/or to conceal the effects of the losses. | ||||
| If the received new data after a loss is from the next picture and the N | If the received new data after a loss is from the next picture (i.e. no | |||
| bit is not set, previously received headers for the particular picture | complete picture loss) and the N bit is not set, previously received | |||
| type (determined from the P bits) can be given to the video decoder | headers for the particular picture type (determined from the P bits) can | |||
| followed by the new data. If N is set, data deletion until a new picture | be given to the video decoder followed by the new data. If N is set, | |||
| start code is advisable unless headers are available from previously | data deletion until a new picture start code is advisable unless headers | |||
| received HP data. In both cases audio needs to be delayed properly. | are available from previously received HP data. | |||
| If data for more than one picture is lost and HP data is not available, | If data for more than one picture is lost and HP data is not available, | |||
| resynchronization to a new video sequence header is advisable. | unless N is zero and at least one packet has been received for every | |||
| intervening picture of the same type and that the N bit was 0 for each | ||||
| of those pictures, resynchronization to a new video sequence header is | ||||
| advisable. | ||||
| In all cases of large packet losses, if the HP data is available, | In all cases of large packet losses, if the HP data is available, | |||
| appropriate portions of it can be given to the video decoder and the | appropriate portions of it can be given to the video decoder and the | |||
| received data can be used irrespective of the N bit value or the number | received data can be used irrespective of the N bit value or the number | |||
| of lost pictures. | of lost pictures. | |||
| Appendix 2. Resynchronization | Appendix 3. Resynchronization | |||
| As described in [3], use of frequent video sequence headers makes it | As described in [3], use of frequent video sequence headers makes it | |||
| possible to join in a program at arbitrary times. Also, it reduces the | possible to join in a program at arbitrary times. Also, it reduces the | |||
| resynchronization time after severe losses. | resynchronization time after severe losses. | |||
| References: | References: | |||
| [1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, | [1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, | |||
| "RTP: A Transport Protocol for Real-Time Applications," | "RTP: A Transport Protocol for Real-Time Applications," | |||
| RFC 1889, January 1996. | RFC 1889, January 1996. | |||
| [2] ISO/IEC International Standard 13818; "Generic coding of moving | [2] ISO/IEC International Standard 13818; "Generic coding of moving | |||
| pictures and associated audio information," November 1994. | pictures and associated audio information," November 1994. | |||
| [3] D. Hoffman, G. Fernando, S. Kleiman, V. Goyal, "RTP Payload Format | [3] D.Hoffman, G. Fernando, V. Goyal, M. R. Civanlar, "RTP Payload Format | |||
| for MPEG1/MPEG2 Video," RFC 2038, October 1996. | for MPEG1/MPEG2 Video," draft-ietf-avt-mpeg-new-00.txt, April 1997. | |||
| [4] M. R. Civanlar, G. L. Cash, "A practical system for MPEG-2 based | [4] J. Mogul, S. Deering, "Path MTU Discovery," RFC 1191, November 1990. | |||
| [5] M. R. Civanlar, G. L. Cash, "A practical system for MPEG-2 based | ||||
| video-on-demand over ATM packet networks and the WWW," Signal | video-on-demand over ATM packet networks and the WWW," Signal | |||
| Processing: Image Communication, no. 8, pp. 221-227, Elsevier, 1996. | Processing: Image Communication, no. 8, pp. 221-227, Elsevier, 1996. | |||
| [5] M. F. Speer, S. McCanne, "RTP Usage with Layered Multimedia | [6] M. F. Speer, S. McCanne, "RTP Usage with Layered Multimedia | |||
| Streams," Internet Draft, draft-speer-avt-layered-video-02.txt, | Streams," Internet Draft, draft-speer-avt-layered-video-02.txt, | |||
| December 1996. | December 1996. | |||
| Author's Address: | Author's Address: | |||
| M. Reha Civanlar | M. Reha Civanlar | |||
| Glenn L. Cash | Glenn L. Cash | |||
| Barry G. Haskell | Barry G. Haskell | |||
| AT&T Labs-Research | AT&T Labs-Research | |||
| 101 Crawfords Corner Road | 100 Schultz Drive | |||
| Holmdel, NJ 07733 | Red Bank, NJ 07701 | |||
| USA | USA | |||
| e-mail: civanlar|glenn|bgh@research.att.com | e-mail: civanlar|glenn|bgh@research.att.com | |||
| End of changes. 34 change blocks. | ||||
| 61 lines changed or deleted | 135 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||