idnits 2.17.1 draft-ietf-avt-rtp-h263-video-01.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-26) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 664 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- -- The document seems to lack a disclaimer for pre-RFC5378 work, but may have content which was first submitted before 10 November 2008. If you have contacted all the original authors and they are all willing to grant the BCP78 rights to the IETF Trust, then this is fine, and you can ignore this comment. If not, you may need to add the pre-RFC5378 disclaimer. (See the Legal Provisions document at https://trustee.ietf.org/license-info for more information.) -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '2' is defined on line 626, but no explicit reference was found in the text == Unused Reference: '6' is defined on line 638, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1889 (ref. '1') (Obsoleted by RFC 3550) -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Obsolete normative reference: RFC 1890 (ref. '3') (Obsoleted by RFC 3551) -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 2032 (ref. '5') (Obsoleted by RFC 4587) ** Downref: Normative reference to an Historic RFC: RFC 2190 (ref. '6') -- Possible downref: Non-RFC (?) normative reference: ref. '7' -- Possible downref: Non-RFC (?) normative reference: ref. '8' Summary: 14 errors (**), 0 flaws (~~), 4 warnings (==), 6 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Audio-Video Transport WG 2 INTERNET-DRAFT C. Bormann / Univ. Bremen 3 L. Cline / Intel 4 G. Deisher / Intel 5 T. Gardos / Intel 6 C. Maciocco / Intel 7 D. Newell / Intel 8 J. Ott / Univ. Bremen 9 G. Sullivan / PictureTel 10 S. Wenger / TU Berlin 11 C. Zhu / Intel 13 Date Generated: 14 Jan. 1998 15 RTP Payload Format for the 1998 Version of 16 ITU-T Rec. H.263 Video (H.263+) 17 19 Status of This Memo 21 This document is an Internet-Draft. Internet-Drafts are working 22 documents of the Internet Engineering Task Force (IETF), its areas, and 23 its working groups. Note that other groups may also distribute working 24 documents as Internet-Drafts. 26 Internet-Drafts are draft documents valid for a maximum of six months 27 and may be updated, replaced, or made obsolete by other documents at any 28 time. It is inappropriate to use Internet-Drafts as reference material 29 or to cite them other than as "work in progress." 31 To learn the current status of any Internet-Draft, please check the 32 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 33 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 34 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 35 ftp.isi.edu (US West Coast). 37 Distribution of this document is unlimited. 39 1. Introduction 41 This document specifies an RTP payload header format applicable to the 42 transmission of video streams generated based on the 1998 version of 43 ITU-T Recommendation H.263 [4]. Because the 1998 version of H.263 is a 44 superset of the 1996 syntax, this format can also be used with the 1996 45 version of H.263. 47 The 1998 version of ITU-T Recommendation H.263 added numerous coding 48 options to improve codec performance over the 1996 version. The 1998 49 version is referred to as H.263+ in this document. Among the new 50 options, the ones with the biggest impact on the RTP payload 51 specification and the error resilience of the video content are the 52 slice structured mode, the independent segment decoding mode (ISD), the 53 reference picture selection mode, and the scalability mode. This 54 section summarizes the impact of these new coding options on 55 packetization. Refer to [4] for more information on coding options. 57 The slice structured mode was added to H.263+ for three purposes: to 58 provide enhanced error resilience capability, to make the bitstream more 59 amenable to use with an underlying packet transport such as RTP, and to 60 minimize video delay. The slice structured mode supports fragmentation 61 at macroblock boundaries. 63 With the independent segment decoding option, a video picture frame is 64 broken into segments and encoded in such a way that each segment is 65 independently decodable. Utilizing ISD in a lossy network environment 66 helps to prevent the propagation of errors from one segment of the 67 picture to others. 69 The reference picture selection mode allows the use of an older 70 reference picture rather than the one immediately preceding the current 71 picture. Usually, the last transmitted frame is implicitly used as the 72 reference picture for inter-frame prediction. If the reference picture 73 selection mode is used, the data stream carries information on what 74 reference frame should be used, indicated by the temporal reference as 75 an ID for that reference frame. The reference picture selection mode 76 can be used with or without a back channel, which provides information 77 to the encoder about the internal status of the decoder. However, no 78 special provision is made herein for carrying back channel information. 80 H.263+ also includes bitstream scalability as an optional coding mode. 81 Three kinds of scalability are defined: temporal, signal-to-noise ratio 82 (SNR), and spatial scalability. Temporal scalability is achieved via 83 the disposable nature of bi-directionally predicted frames, or B-frames. 84 SNR scalability permits refinement of encoded video frames, thereby 85 improving the quality (or SNR). Spatial scalability is similar to SNR 86 scalability except the refinement layer is twice the size of the base 87 layer in the horizontal dimension, vertical dimension, or both. 89 2. Usage of RTP 91 When transmitting H.263+ video streams over the Internet, the output of 92 the encoder can be packetized directly. All the bits resulting from the 93 bitstream including the fixed length codes and variable length codes 94 will be included in the packet, with the only exception being that when 95 the payload of a packet begins with a Picture, GOB, Slice, EOS, or EOSBS 96 start code, the first two (all-zero) bytes of the start code are removed 97 and replaced by setting an indicator bit in the payload header. 99 For H.263+ bitstreams coded with temporal, spatial, or SNR scalability, 100 each layer may be transported to a different network address. More 101 specifically, each layer may use a unique IP address and port number 102 combination. The temporal relations between layers shall be expressed 103 using the RTP timestamp so that they can be synchronized at the 104 receiving ends in multicast or unicast applications. 106 The H.263+ video stream will be carried as payload data within RTP 107 packets. A new H.263+ payload header is defined in section 4. This 108 section defines the usage of the RTP fixed header and H.263+ video 109 packet structure. 111 2.1 RTP Header Usage 113 Each RTP packet starts with a fixed RTP header. The following fields of 114 the RTP fixed header are used for H.263+ video streams: 116 Marker bit (M bit): The Marker bit of the RTP header is set to 1 when 117 the current packet carries the end of current frame, and is 0 otherwise. 119 Payload Type (PT): The Payload Type shall specify the H.263+ video 120 payload format. 122 Timestamp: The RTP Timestamp encodes the sampling instance of the first 123 video frame data contained in the RTP data packet. The RTP timestamp 124 shall be the same on successive packets if a video frame occupies more 125 than one packet. In a multilayer scenario, all pictures corresponding 126 to the same temporal reference should use the same timestamp. If 127 temporal scalability is used (if B-frames are present), the timestamp 128 may not be monotonically increasing in the RTP stream. If B-frames are 129 transmitted on a separate layer and address, they must be synchronized 130 properly with the reference frames. Refer to the 1998 ITU-T 131 Recommendation H.263 [4] for information on required transmission order 132 to a decoder. For an H.263+ video stream, the RTP timestamp is based on 133 a 90 kHz clock, the same as that of the RTP payload for H.261 stream 134 [5]. Since both the H.263+ data and the RTP header contain time 135 information, it is required that those timing information run 136 synchronously. That is, both the RTP timestamp and the temporal 137 reference (TR in the picture header of H.263) should carry the same 138 relative timing information. If necessary, mathematical rounding should 139 be applied to the information of the H.263+ data stream to generate the 140 RTP timestamp (this is especially true for the standard picture clock 141 frequency of 30000/1001 Hz, and may also be true if custom picture clock 142 frequencies are to be used; see [4] for details). 144 2.2 Video Packet Structure 146 A section of an H.263+ compressed bitstream is carried as a payload 147 within each RTP packet. For each RTP packet, the RTP header is followed 148 by an H.263+ payload header, which is followed by a number of bytes of a 149 standard H.263+ compressed bitstream. The size of the H.263+ payload 150 header is variable depending on the payload involved as detailed in the 151 section 4. The layout of the RTP H.263+ video packet is shown as: 153 0 1 2 3 154 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 155 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 156 | RTP Header ... 157 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 158 | H.263+ Payload Header ... 159 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 160 | H.263+ Compressed Data Stream ... 161 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 163 Any H.263+ start codes can be byte aligned by an encoder by using the 164 stuffing mechanisms of H.263+. As specified in H.263+, picture, slice, 165 and EOSBS start codes shall always be byte aligned, and GOB and EOS 166 start codes may be byte aligned. For packetization purposes, GOB start 167 codes should be byte aligned, although this is not absolutely required 168 herein since it is not required in H.263+. 170 All H.263+ start codes (Picture, GOB, Slice, EOS, and EOSBS) begin with 171 16 zero-valued bits. If a start code is byte aligned and it occurs at 172 the beginning of a packet, these two bytes shall be removed from the 173 H.263+ compressed data stream in the packetization process and shall 174 instead be represented by setting a bit (the P bit) in the payload 175 header. 177 3. Design Considerations 179 The goals of this payload format are to specify an efficient way of 180 encapsulating an H.263+ standard compliant bitstream and to enhance the 181 resiliency towards packet losses. Due to the large number of different 182 possible coding schemes in H.263+, a copy of the picture header with 183 configuration information is inserted into the payload header when 184 appropriate. The use of that copy of the picture header along with the 185 payload data can allow decoding of a received packet even in such cases 186 in which another packet containing the original picture header becomes 187 lost. 189 There are a few assumptions and constraints associated with this H.263+ 190 payload header design. The purpose of this section is to point out 191 various design issues and also to discuss several coding options 192 provided by H.263+ that may impact the performance of network-based 193 H.263+ video. 195 o The optional slice structured mode described in annex K of H.263+ [4] 196 enables more flexibility for packetization. Similar to a picture 197 segment that begins with a GOB header, the motion vector predictors in 198 a slice are restricted to reside within its boundaries. However, 199 slices provide much greater freedom in the selection of the size and 200 shape of the area which is represented as a distinct decodable region. 201 In particular, slices can have a size which is dynamically selected to 202 allow the data for each slice to fit into a chosen packet size. 203 Slices can also be chosen to have a rectangular shape which is 204 conducive for minimizing the impact of errors and packet losses on 205 motion compensated prediction. For these reasons, the use of the 206 slice structured mode is strongly recommended for any applications 207 used in environments where significant packet loss occurs. 209 o In non-rectangular slice structured mode, only complete slices should 210 be included in a packet. In other words, slices should not be 211 fragmented across packet boundaries. The only reasonable need for a 212 slice to be fragmented across packet boundaries is when the encoder 213 which generated the H.263+ data stream could not be influenced by an 214 awareness of the packetization process (such as when sending H.263+ 215 data through a network other than the one to which the encoder is 216 attached, as in network gateway implementations). Optimally, each 217 packet will contain only one slice. 219 o The independent segment decoding (ISD) described in annex R of [4] 220 prevents any data dependency across slice or GOB boundaries in the 221 reference picture. It can be utilized to further improve resiliency 222 in high loss conditions. 224 o If ISD is used in conjunction with the slice structure, the 225 rectangular slice submode shall be enabled and the dimensions and 226 quantity of the slices present in a frame shall remain the same 227 between each two intra-coded frames (I-frames), as required in H.263+. 228 The individual ISD segments may also be entirely intra coded from time 229 to time to realize quick error recovery without adding the latency 230 time associated with sending complete INTRA-pictures. 232 o When the slice structure is not applied, the insertion of a 233 (preferably byte-aligned) GOB header can be used to provide resync 234 boundaries in the bitstream, as the presence of a GOB header 235 eliminates the dependency of motion vector prediction across GOB 236 boundaries. These resync boundaries provide natural locations for 237 packet payload boundaries. 239 o H.263+ allows picture headers to be sent in an abbreviated form in 240 order to prevent repetition of overhead information that does not 241 change from picture to picture. For resiliency, sending a complete 242 picture header for every frame is often advisable. This means, that 243 especially in cases with high packet loss probability in which picture 244 header contents are not expected to be highly predictable, the sender 245 may always set the subfield UFEP in PLUSPTYPE to '001' in the H.263+ 246 video bitstream. 248 o In a multi-layer scenario, each layer may be transmitted to a 249 different network address. The configuration of each layer such as 250 the enhancement layer number (ELNUM), reference layer number (RLNUM), 251 and scalability type should be determined at the start of the session 252 and should not change during the course of the session. 254 o All start codes can be byte aligned, and picture, slice, and EOSBS 255 start codes are always byte aligned. The boundaries of these 256 syntactical elements provide ideal locations for placing packet 257 boundaries. 259 o We assume that a maximum Picture Header size of 504 bits is 260 sufficient. The syntax of H.263+ does not explicitly prohibit larger 261 picture header sizes, but the use of such extremely large picture 262 headers is not expected. 264 4. H.263+ Payload Header 266 For H.263+ video streams, each RTP packet carries only one H.263+ video 267 packet. The H.263+ payload header is always present for each H.263+ 268 video packet. The payload header is of variable length. A 16 bit field 269 of the basic payload header may be followed by an 8 bit field for Video 270 Redundancy Coding information, and/or by a variable length picture 271 header as indicated by PLEN. These optional fields appear in the order 272 given above when present. 274 If a picture header is included in the payload header, the length of the 275 picture header in number of bytes is specified by PLEN. The minimum 276 length of the payload header is 16 bits, corresponding to PLEN equal to 277 0 and no VRC information present. 279 The remainder of this section defines the various components of the RTP 280 payload header. Section five defines the various packet types that are 281 used to carry different types of H.263+ coded data, and section six 282 summarizes how to distinguish between the various packet types. 284 4.1 General H.263+ payload header 286 The H.263+ payload header is structured as follows: 288 0 1 289 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 | RR |P|V| PLEN |PEBIT| 292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 294 RR: 5 bits 295 Reserved bits. Shall be zero. 297 P: 1 bit 298 Indicates the picture start or a picture segment (GOB/Slice) start or 299 a video sequence end (EOS or EOSBS). Two bytes of zero bits then have 300 to be prefixed to the payload of such a packet to compose a complete 301 picture/GOB/slice/EOS/EOSBS start code. This bit allows the omission 302 of the two first bytes of the start codes, thus improving the 303 compression ratio. 305 V: 1 bit 306 Indicates the presence of an 8 bit field containing information for 307 Video Redundancy Coding (VRC), which follows immediately after the 308 initial 16 bits of the payload header if present. For syntax and 309 semantics of that 8 bit VRC field see section 4.2. 311 PLEN: 6 bits 312 Picture header length in number of bytes. If no additional picture 313 header is attached, PLEN is 0. If PLEN>0, the additional picture 314 header is attached immediately following the rest of the payload 315 header. 317 PEBIT: 3 bits 318 Indicates the number of bits that shall be ignored in the last byte of 319 the picture header. If PLEN is zero, then PEBIT shall also be zero. 321 4.2 Video Redundancy Coding Header Extension 323 Video Redundancy Coding (VRC) is an optional mechanism intended to 324 improve error resilience over packet networks. Implementing VRC in 325 H.263+ will require the Reference Picture Selection option described in 326 Annex N. By having multiple "threads" of independently inter-frame 327 predicted pictures, damage of individual frame will cause distortions 328 only within its own thread but leave the other threads unaffected. From 329 time to time, all threads converge to a so-called sync frame (an INTRA 330 picture or a non-INTRA picture which is redundantly represented within 331 multiple threads); from this sync frame, the independent threads are 332 started again. For a more complete description of VRC see [7]. 334 While a VRC data stream is - like all H.263+ data - totally self- 335 contained, it may be useful for the transport hierarchy implementation 336 to have knowledge about the current damage status of each thread. On 337 the Internet, this status can easily be determined by observing the 338 marker bit, the sequence number of the RTP header, and the thread-id and 339 a circling "packet per thread" number. The latter two numbers are coded 340 in the VRC header extension. 342 The format of the VRC header extension is as follows: 344 0 1 2 3 4 5 6 7 345 +-+-+-+-+-+-+-+-+ 346 | TID | Trun |S| 347 +-+-+-+-+-+-+-+-+ 349 TID: 3 bits 350 Thread ID. Up to 7 threads are allowed. Each frame of H.263+ VRC data 351 will use as reference information only sync frames or frames within 352 the same thread. By convention, thread 0 is expected to be the 353 "canonical" thread, which is the thread from which the sync frame 354 should ideally be used. In the case of corruption or loss of the 355 thread 0 representation, a representation of the sync frame with a 356 higher thread number can be used by the decoder. Lower thread numbers 357 are expected to contain equal or better representations of the sync 358 frames than higher thread numbers in the absence of data corruption or 359 loss. See [7] for details. 361 Trun: 4 bits 362 Monotonically increasing (modulo 16) 4 bit number counting the packet 363 number within each thread. 365 S: 1 bit 366 A bit that indicates that the packet content is for a sync frame. An 367 encoder using VRC may send several representations of the same "sync" 368 picture, in order to ensure that regardless of which thread of 369 pictures is corrupted by errors or packet losses, the reception of at 370 least one representation of a particular picture is ensured (within at 371 least one thread). The sync picture can then be used for the 372 prediction of any thread. If packet losses have not occurred, then 373 the sync frame contents of thread 0 can be used and those of other 374 threads can be discarded (and similarly for other threads). Thread 0 375 is considered the "canonical" thread, the use of which is preferable 376 to all others. The contents of packets having lower thread numbers 377 shall be considered as generally preferred over those with higher 378 thread numbers. 380 5. Packetization schemes 382 5.1 Picture Segment Packets and Sequence Ending Packets (P=1) 384 A picture segment packet is defined as a packet that starts at the 385 location of a Picture, GOB, or slice start code in the H.263+ data 386 stream. This corresponds to the definition of the start of a video 387 picture segment as defined in H.263+. For such packets, P=1 always. 389 An extra picture header can sometimes be attached in the payload header 390 of such packets. Whenever an extra picture header is attached as 391 signified by PLEN>0, only the last six bits of its picture start code, 392 '100000', are included in the payload header. A complete H.263+ picture 393 header with byte aligned picture start code can be conveniently 394 assembled on the receiving end by prepending the sixteen leading '0' 395 bits. 397 When PLEN>0, the end bit position corresponding to the last byte of the 398 picture header data is indicated by PEBIT. The actual bitstream data 399 shall begin on an 8-bit byte boundary following the payload header. 401 A sequence ending packet is defined as a packet that starts at the 402 location of an EOS or EOSBS code in the H.263+ data stream. This 403 delineates the end of a sequence of H.263+ video data (more H.263+ video 404 data may still follow later, however, as specified in ITU-T 405 Recommendation H.263). For such packets, P=1 and PLEN=0 always. 407 The optional header extension for VRC may or may not be present as 408 indicated by the V bit flag. 410 5.1.1 Packets that begin with a Picture Start Code 412 Any packet that contains the whole or the start of a coded picture shall 413 start at the location of the picture start code (PSC), and should 414 normally be encapsulated with no extra copy of the picture header. In 415 other words, normally PLEN=0 in such a case. However, if the coded 416 picture contains an incomplete picture header (UFEP = "000"), then a 417 representation of the complete (UFEP = "001") picture header may be 418 attached during packetization in order to provide greater error 419 resilience. Thus, for packets that start at the location of a picture 420 start code, PLEN shall be zero unless both of the following conditions 421 apply: 422 1) The picture header in the H.263+ bitstream payload is incomplete 423 (PLUSPTYPE present and UFEP="000"), and 424 2) The additional picture header which is attached is not incomplete 425 (UFEP="001"). 427 A packet which begins at the location of a Picture, GOB, slice, EOS, or 428 EOSBS start code shall omit the first two (all zero) bytes from the 429 H.263+ bitstream, and signify their presence by setting P=1 in the 430 payload header. 432 Here is an example of encapsulating the first packet in a frame (without 433 an attached redundant complete picture header): 435 0 1 2 3 436 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 437 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 438 | RR |1|V|0|0|0|0|0|0|0|0|0| 439 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+ 440 | bitstream data without the first two 0 bytes of the PSC | 441 +---------------------------------------------------------------+ 443 5.1.2 Packets that begin with GBSC or SSC 445 For a packet that begins at the location of a GOB or slice start code, 446 PLEN may be zero or may be nonzero, depending on whether a redundant 447 picture header is attached to the packet. In environments with very low 448 packet loss rates, or when picture header contents are very seldom 449 likely to change (except as can be detected from the GFID syntax of 450 H.263+), a redundant copy of the picture header is not required. 451 However, in less ideal circumstances a redundant picture header should 452 be attached for enhanced error resilience, and its presence is indicated 453 by PLEN>0. 455 Assuming a PLEN of 9, below is an example of a packet that begins with a 456 GBSC or a SSC: 458 0 1 2 3 459 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 460 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 461 | RR |1|V|0 0 1 0 0 1|PEBIT| 462 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 464 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 465 |1 0 0 0 0 0| picture header starting with TR, PTYPE, ... | 466 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 467 | ... | 468 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 469 | ... | bitstream data begins with GBSC/SCC ... . 470 +-+-+-+-+-+-+-+-+-----------------------------------------------+ 472 Notice that only the last six bits of the picture start code, '100000', 473 are included in the payload header. A complete H.263+ picture header 474 with byte aligned picture start code can be conveniently assembled if 475 needed on the receiving end by prepending the sixteen leading '0' bits. 477 5.1.3 Packets that Begin with an EOS or EOSBS Code 479 For a packet that begins with an EOS or EOSBS code, PLEN shall be zero, 480 and no Picture, GOB, or Slice start codes shall be included within the 481 same packet. As with other packets beginning with start codes, the two 482 all-zero bytes that begin the EOS or EOSBS code at the beginning of the 483 packet shall be omitted, and their presence shall be indicated by 484 setting the P bit to 1 in the payload header. 486 System designers should be aware that some decoders may interpret the 487 loss of a packet containing only EOS or EOSBS information as the loss of 488 essential video data and may thus respond by not displaying some 489 subsequent video information. Since EOS and EOSBS codes do not actually 490 affect the decoding of video pictures, they are somewhat unnecessary to 491 send at all. Because of the danger of misinterpretation of the loss of 492 such a packet, encoders are generally to be discouraged from sending EOS 493 and EOSBS. 495 Below is an example of a packet containing an EOS code: 497 0 1 498 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 499 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 500 | RR |1|V|0|0|0|0|0|0|0|0|0| 501 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 502 |1|1|1|1|1|1|0|0| 503 +-+-+-+-+-+-+-+-+ 505 5.2 Encapsulating Follow-On Packet (P=0) 507 A Follow-on packet contains a number of bytes of coded H.263+ data which 508 does not start at a synchronization point. That is, a Follow-On packet 509 does not start with a Picture, GOB, Slice, EOS, or EOSBS header, and it 510 may or may not start at a macroblock boundary. Since Follow-on packets 511 do not start at synchronization points, the data at the beginning of a 512 follow-on packet is not independently decodable. For such packets, P=0 513 always. If the preceding packet of a Follow-on packet got lost, the 514 receiver may discard that Follow-on packet as well as all other 515 following Follow-on packets. Better behavior, of course, would be for 516 the receiver to scan the interior of the packet payload content to 517 determine whether any start codes are found in the interior of the 518 packet which can be used as resync points. The use of an attached copy 519 of a picture header for a follow-on packet is useful only if the 520 interior of the packet or some subsequent follow-on packet contains a 521 resync code such as a GOB or slice start code. PLEN>0 is allowed, since 522 it may allow resync in the interior of the packet. The decoder may also 523 be resynchronized at the next segment or picture packet. 525 Here is an example of a follow-on packet (with PLEN=0): 527 0 1 2 3 528 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 529 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 530 | RR |0|V|0|0|0|0|0|0|0|0|0| 531 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-------------------------------+ 532 | bitstream data | 533 +---------------------------------------------------------------+ 535 6. Use of this payload specification 537 There is no syntactical difference between a picture segment packet and 538 a Follow-on packet, other than the indication P=1 for picture segment or 539 sequence ending packets and P=0 for Follow-on packets. See the 540 following for a summary of the entire packet types and ways to 541 distinguish between them. 543 For a more detailed discussion on how to use the payload specification, 544 the reader should refer to [8]. 546 It is possible to distinguish between the different packet types by 547 checking the P bit and the first 6 bits of the payload along with the 548 header information. The following table shows the packet type for 549 permutations of this information (see also the picture/GOB/Slice header 550 descriptions in H.263+ for details): 552 --------------+--------------+----------------------+------------------- 553 First 6 bits | P-Bit | PLEN | Packet | Remarks 554 of Payload |(payload hdr.)| | 555 --------------+--------------+----------------------+------------------- 556 100000 | 1 | 0 | Picture | Typical Picture 557 100000 | 1 | > 0 | Picture | Note UFEP 558 1xxxxx | 1 | 0 | GOB/Slice/EOS/EOSBS | See possible GNs 559 1xxxxx | 1 | > 0 | GOB/Slice | See possible GNs 560 Xxxxxx | 0 | 0 | Follow-on | 561 Xxxxxx | 0 | > 0 | Follow-on | Interior Resync 562 --------------+--------------+----------------------+------------------- 564 See [4] for details regarding the possible values of the six bits (a "1" 565 bit followed by a five bit GN field explicit or emulated) of GOB, Slice, 566 EOS, and EOSBS codes. 568 As defined in this specification, every start of a coded frame (as 569 indicated by the presence of a PSC) has to be encapsulated as a picture 570 segment packet. If the whole coded picture fits into one packet of 571 reasonable size (which is dependent on the connection characteristics), 572 this is the only type of packet that needs to be observed. Due to the 573 high compression ratio achieved by H.263+ it is often possible to use 574 this mechanism, especially for small spatial picture formats such as 575 QCIF and typical Internet packet sizes around 1500 bytes. 577 If the complete coded frame does not fit into a single packet, two 578 different ways for the packetization may be chosen. In case of very low 579 or zero packet loss probability, one or more Follow-on packets may be 580 used for coding the rest of the picture. Doing so leads to minimal 581 coding and packetization overhead as well as to an optimal use of the 582 maximal packet size, but does not provide any added error resilience. 584 The alternative is to break the picture into reasonably small partitions 585 - called Segments - (by using the Slice or GOB mechanism), that do offer 586 synchronization points. By doing so and using the Picture Segment 587 payload with PLEN>0, decoding of the transmitted packets is possible 588 even in such cases in which the Picture packet containing the picture 589 header was lost (provided any necessary reference picture is available). 590 Picture Segment packets can also be used in conjunction with Follow-on 591 packets for large segment sizes. 593 7. Security Considerations 595 RTP packets using the payload format defined in this specification are 596 subject to the security considerations discussed in the RTP 597 specification [1], and any appropriate RTP profile (for example [3]). 598 This implies that confidentiality of the media streams is achieved by 599 encryption. Because the data compression used with this payload format 600 is applied end-to-end, encryption may be performed after compression so 601 there is no conflict between the two operations. 603 A potential denial-of-service threat exists for data encodings using 604 compression techniques that have non-uniform receiver-end computational 605 load. The attacker can inject pathological datagrams into the stream 606 which are complex to decode and cause the receiver to be overloaded. 607 However, this encoding does not exhibit any significant non-uniformity. 609 As with any IP-based protocol, in some circumstances a receiver may be 610 overloaded simply by the receipt of too many packets, either desired or 611 undesired. Network-layer authentication may be used to discard packets 612 from undesired sources, but the processing cost of the authentication 613 itself may be too high. In a multicast environment, pruning of specific 614 sources may be implemented in future versions of IGMP [5] and in 615 multicast routing protocols to allow a receiver to select which sources 616 are allowed to reach it. 618 A security review of this payload format found no additional 619 considerations beyond those in the RTP specification. 621 8. References 623 [1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP : A 624 Transport Protocol for Real-Time Applications", RFC 1889. 626 [2] "Video Codec for Audiovisual Services at px64 kbits/s", ITU-T 627 Recommendation H.261, 1993. 629 [3] "RTP Profile for Audio and Video Conference with Minimal Control", 630 RFC 1890. 632 [4] "Video Coding for Low Bitrate Communication", Draft ITU-T 633 Recommendation H.263, Draft 20, September 1997. 635 [5] T. Turletti, C. Huitema, "RTP Payload Format for H.261 Video 636 Streams", RFC 2032. 638 [6] C. Zhu, "RTP Payload Format for H.263 Video Streams", RFC 2190. 640 [7] S. Wenger, "Video Redundancy Coding in H.263+", Proc. AVSPN97, 641 Aberdeen, U.K.. 643 [8] S. Wenger, G. Knorr, J. Ott: "Error resilience support in H.263 644 V.2", submitted for publication to IEEE T-CSVT, 1997.