idnits 2.17.1 draft-ietf-avt-rtp-h263-video-00.txt: Checking boilerplate required by RFC 5378 and the IETF Trust (see https://trustee.ietf.org/license-info): ---------------------------------------------------------------------------- ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in this document. Expected boilerplate is as follows today (2024-04-25) according to https://trustee.ietf.org/license-info : IETF Trust Legal Provisions of 28-dec-2009, Section 6.a: This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2: Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved. IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3: This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt: ---------------------------------------------------------------------------- ** Missing expiration date. The document expiration date should appear on the first and last page. ** The document seems to lack a 1id_guidelines paragraph about Internet-Drafts being working documents. ** The document seems to lack a 1id_guidelines paragraph about 6 months document validity -- however, there's a paragraph with a matching beginning. Boilerplate error? ** The document seems to lack a 1id_guidelines paragraph about the list of current Internet-Drafts. ** The document seems to lack a 1id_guidelines paragraph about the list of Shadow Directories. ** Expected the document's filename to be given on the first page, but didn't find any == No 'Intended status' indicated for this document; assuming Proposed Standard == The page length should not exceed 58 lines per page, but there was 1 longer page, the longest (page 1) being 387 lines Checking nits according to https://www.ietf.org/id-info/checklist : ---------------------------------------------------------------------------- ** The document seems to lack an Abstract section. ** The document seems to lack an IANA Considerations section. (See Section 2.2 of https://www.ietf.org/id-info/checklist for how to handle the case when there are no actions for IANA.) ** The document seems to lack an Authors' Addresses Section. ** The document seems to lack separate sections for Informative/Normative References. All references will be assumed normative when checking for downward references. Miscellaneous warnings: ---------------------------------------------------------------------------- == Couldn't figure out when the document was first submitted -- there may comments or warnings related to the use of a disclaimer for pre-RFC5378 work that could not be issued because of this. Please check the Legal Provisions document at https://trustee.ietf.org/license-info to determine if you need the pre-RFC5378 disclaimer. -- Couldn't find a document date in the document -- date freshness check skipped. Checking references for intended status: Proposed Standard ---------------------------------------------------------------------------- (See RFCs 3967 and 4897 for information about using normative references to lower-maturity documents in RFCs) == Unused Reference: '2' is defined on line 358, but no explicit reference was found in the text ** Obsolete normative reference: RFC 1889 (ref. '1') (Obsoleted by RFC 3550) -- Possible downref: Non-RFC (?) normative reference: ref. '2' ** Obsolete normative reference: RFC 1890 (ref. '3') (Obsoleted by RFC 3551) -- Possible downref: Non-RFC (?) normative reference: ref. '4' ** Obsolete normative reference: RFC 2032 (ref. '5') (Obsoleted by RFC 4587) ** Downref: Normative reference to an Historic RFC: RFC 2190 (ref. '6') Summary: 15 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--). Run idnits with the --verbose option for more detailed information about the items above. -------------------------------------------------------------------------------- 1 Internet Engineering Task Force Audio-Video Transport WG 2 INTERNET-DRAFT C. Bormann / Univ. Bremen 3 L. Cline / Intel 4 G. Deisher / Intel 5 T. Gardos / Intel 6 C. Maciocco / Intel 7 D. Newell / Intel 8 J. Ott / Univ. Bremen 9 S. Wenger / TU Berlin 10 C. Zhu / Intel 12 RTP Payload Format for the 1998 Version of 13 ITU-T Rec. H.263 Video (H.263+) 15 Status of This Memo 17 This document is an Internet-Draft. Internet-Drafts are working 18 documents of the Internet Engineering Task Force (IETF), its areas, and 19 its working groups. Note that other groups may also distribute working 20 documents as Internet-Drafts. 22 Internet-Drafts are draft documents valid for a maximum of six months 23 and may be updated, replaced, or made obsolete by other documents at any 24 time. It is inappropriate to use Internet-Drafts as reference material 25 or to cite them other than as "work in progress." 27 To learn the current status of any Internet-Draft, please check the 28 "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow 29 Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe), 30 munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or 31 ftp.isi.edu (US West Coast). 33 Distribution of this document is unlimited. 35 1. Introduction 37 This document specifies an RTP payload header format applicable to the 38 transportation of video streams generated based on the 1998 version of 39 ITU-T Recommendation H.263. 41 The 1998 version of ITU-T Recommendation H.263 added numerous coding 42 options to improve codec performance over the 1996 version. The 1998 43 version is referred to as H.263+ in this document. Among the new 44 options, the ones with the biggest impact on the RTP payload are the 45 slice structured mode (SS), independent segment decoding mode (ISD), and 46 the scalability mode. This section summarizes the impact of these new 47 coding options on packetization. Refer to [4] for more information on 48 coding options. 50 Slice structure was added to H.263+ for three purposes: to provide 51 enhanced error resilience capability, to make the bitstream more 52 amenable to use with an underlying packet transport such as RTP, and to 53 minimize video delay. The slice structured mode supports fragmentation 54 at macroblock boundaries. 56 When the independent segment decoding option is employed, a video 57 picture frame is broken into segments and encoded in such a way that 58 each segment is independently decodable. Utilizing ISD in a lossy 59 network environment helps prevent the propagation of errors from one 60 segment of the picture to others. 62 H.263+ also includes bitstream scalability as an optional coding mode. 63 Three kinds of scalability are defined: temporal, signal-to-noise ratio 64 (SNR), and spatial scalability. Temporal scalability is achieved via 65 the disposable nature of bi-directionally predicted frames, or B-frames. 66 SNR scalability permits refinement of encoded video frames, thereby 67 improving the quality (or SNR). Spatial scalability is similar to SNR 68 scalability except the refinement layer is twice the size of the base 69 layer in the horizontal dimension, vertical dimension, or both. 71 2. Usage of RTP 73 When transmitting H.263+ video streams over the internet, the output of 74 the encoder can be packetized directly. All the bits resulting from the 75 bitstream including the fixed length codes and variable length codes 76 will be included in the packet. 78 For H.263+ bitstreams coded with temporal, spatial, or SNR scalability, 79 each layer may be transported to a different network address. More 80 specifically, each layer may use a unique IP address and port 81 combination. In addition, temporal relations between layers shall be 82 expressed using the RTP timestamp so that they can be synchronized at 83 the receiving ends in multicast or unicast applications. 85 The H.263+ video streams will be carried as payload data within RTP 86 packets. A new H.263+ payload header, H.263+ payload header, is defined 87 in section 4. This section defines the usage of the RTP fixed header 88 and H.263+ video packet structure. 90 2.1 RTP Header Usage 92 Each RTP packet starts with a fixed RTP header. The following fields of 93 the RTP fixed header are used for H.263+ video streams: 95 Marker bit (M bit): The Marker bit of the RTP header is set to 1 when 96 the current packet carries the end of current frame, and is 0 otherwise. 98 Payload Type (PT): The Payload Type shall specify H.263+ video payload 99 format. A dynamic payload can be used initially until a static payload 100 type is assigned. 102 Timestamp: The RTP Timestamp encodes the sampling instance of the first 103 video frame contained in the RTP data packet. The RTP timestamp may be 104 the same on successive packets if a video frame occupies more than one 105 packet. In a multilayer scenario, all pictures corresponding to the 106 same temporal reference should pertain the same timestamp. If temporal 107 scalability is used and B-frames are present, the timestamp may not be 108 monotonically increasing in the video stream. If B-frames are 109 transmitted on a separate layer and address, they must be synchronized 110 properly with the reference frames. Please refer to the 1998 ITU 111 Recommendation for H.263 [4] for information on required transmission 112 order to a decoder. For an H.263+ video stream, the RTP timestamp is 113 based on a 90 kHz clock, the same as that of the RTP payload for H.261 114 stream [5]. 116 2.2 Video Packet Structure 118 An H.263+ compressed bitstream is carried as a payload within each RTP 119 packet. For each RTP packet, the RTP header is followed by an H.263+ 120 payload header, which is followed by a standard H.263+ compressed 121 bitstream. The size of the H.263+ payload header is variable depending 122 on the payload involved as detailed in the section 4. The layout of the 123 RTP H.263+ video packet is shown as: 125 0 1 2 3 126 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 127 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 128 | RTP Header ... 129 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 130 | H.263+ Payload Header ... 131 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 132 | H.263+ Compressed Data Stream ... 133 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 135 3. Design Considerations 137 The goal of this payload format is to specify an efficient way of 138 encapsulating an H.263+ standard compliant bitstream and enhance the 139 resiliency towards packet losses. Due to the large number of different 140 possible coding schemes in H.263+, a copy of the picture header with 141 configuration information is inserted into the payload header when 142 appropriate. 144 There are a few assumptions and constraints associated with this H.263+ 145 payload header design. The purpose of this section is to point out 146 various design issues and also discuss several coding options provided 147 by H.263+ that may impact the performance of network video. 149 . It is reasonable to assume that no single macroblock will be too large 150 to fit in a packet. 152 . The optional slice structured mode described in annex K of H.263+ [4] 153 enables more flexibility for packetization. Furthermore, packets 154 based on a slice structure are also inherently more loss resilient. 155 Similar to a picture segment that begins with a GOB header, the 156 motion vector predictors in a slice are restricted to reside within 157 its boundaries. For these reasons, the use of the slice structured 158 mode is strongly recommended for network applications. 160 . In non-rectangular slice structured mode, only complete slices should 161 be included in a packet. In other words, slices should not be 162 fragmented across packets. Optimally, a packet will contain only one 163 slice. 165 . When the slice structure is not applied, the insertion of a GOB header 166 in every GOB is recommended to reduce the dependency on motion vector 167 prediction across GOBs. See section 3.3 of [6] for more information. 169 . The independently segmented decoding described in annex R of [4] does 170 not allow any data dependency across slice or GOB boundaries in 171 reference picture. It can be utilized to further improve resiliency 172 in high loss conditions. 174 . If ISD is used in conjunction with the slice structure, the 175 rectangular slice submode shall be enabled and the dimensions and 176 quantity of the slices present in a frame shall remain the same 177 between two intra-coded frames (I-frames). The ISD segments may be 178 entirely intra coded from time to time to realize quick error 179 recovery without adding latency time associated with sending complete 180 I-frames. 182 . For resiliency, sending a full picture header for every frame is 183 recommended. In other words, the sender should always set the 184 subfield UFEP in PLUSPTYPE to '001' in the video bitstream. 186 . In a multi-layer scenario, each layer can be transmitted to a 187 different network address. The configuration of each layer such as 188 the enhancement layer number (ELNUM), reference layer number (RLNUM), 189 and scalability type should be determined at the start of the session 190 and should not change during the course of the session. 192 4. H.263+ Payload Header 194 For H.263+ video streams, each RTP packet carries only one H.263+ video 195 packet. The H.263+ payload header is always present for each H.263+ 196 video packet. The payload header has variable length. If a picture 197 header is included in the payload header, the length of the picture 198 header in number of bytes is specified by PLEN. The minimum length of 199 the payload header is 32 bits, corresponding to PLEN equals 0. 201 The H.263+ payload header is structured as follow: 203 0 1 2 3 204 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 205 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 206 |V=0|SBIT |EBIT | PLEN |PEBIT| TID | Trun | RR | 207 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 208 |1 0 0 0 0 0| picture header starting with TR, PTYPE, ... . 209 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 211 V: 2 bits 212 Version number. Set to '00' for this payload format. 213 [Ed. Note: The version control will not take effect until a draft has 214 been formally submitted to the IETF.] 216 SBIT: 3 bits 217 Start bit position specifies the number of bits that should be 218 ignored in the first data byte of the payload. 220 EBIT: 3 bits 221 End bit position indicates the number of bits that should be ignored 222 in the last data byte of the payload. 224 PLEN: 3 bits 225 Picture header length in number of bytes. 227 PEBIT: 3 bits 228 End bit position indicates the number of bits that should be ignored 229 in the last byte of the picture header. 231 TID: 3 bits 232 Thread id. Used only in optional video redundancy coding mode (VRC). 233 See annex N of [4]. All three bits must be set to 0 unless VRC mode 234 is applied. 236 Trun: 4 bits 237 Cyclic packet number. Used only in optional VRC mode. These bits 238 must be set to 0 unless VRC mode is applied. 240 RR: 9 bits 241 Reserved bits. 243 Notice that the TID and Trun fields are associated only with the video 244 redundancy coding usage scenario derived from the reference picture 245 selection mode specified in annex N of [4]. The TID and Trun bits must 246 be set to 0 if VRC is not used. The use of VRC shall be negotiated by 247 external means. 249 4.1 Encapsulating Packet that Begins with PSC 251 Any packet that begins with a picture start code (PSC), i.e. the first 252 packet of a picture frame, shall be encapsulated using only the first 253 32-bit word of the payload header since a picture header is already 254 included in the data bitstream. In this case, PLEN shall be 0. 256 Here is an example of encapsulating the first packet in a frame: 258 0 1 2 3 259 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 260 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 261 |0 0|SBIT |EBIT |0 0 0 0 0|0 0 0| TID | Trun | RR | 262 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 263 | bitstream data starts with complete picture header ... . 264 +---------------------------------------------------------------+ 266 4.2 Encapsulating Packet that Begins with GBSC or SSC 268 Any packet that begins with either a GOB start code (GBSC) or a slice 269 start code (SSC) shall include a copy of the picture header in the 270 payload header for resiliency. PLEN shall be set to specify the length 271 of the included picture header in bytes. Hence, PLEN > 0. The end bit 272 position corresponding to the last byte of the picture header data is 273 indicated by PEBIT. Actual bitstream data shall begin on an 8-bit byte 274 boundary following the payload header. 276 Notice that only the last six bits of the picture start code, '100000', 277 are included in the payload header. A complete H.263+ picture header 278 with byte aligned picture start code can be conveniently assembled if 279 needed on the receiving end by prepending the sixteen leading '0' bits. 281 Assuming a PLEN of 9, below is an example of a packet that begins with a 282 GBSC or a SSC: 284 0 1 2 3 285 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 286 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 287 |0 0|SBIT |EBIT |0 1 0 0 1|PEBIT| TID | Trun | RR | 288 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 289 |1 0 0 0 0 0| picture header starting with TR, PTYPE, ... | 290 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 291 | ... | 292 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 293 | ... | bitstream data begins with GBSC/SCC ... . 294 +-+-+-+-+-+-+-+-+-----------------------------------------------+ 296 4.3 Encapsulating Follow-On Packet 298 When slice structure coding option is not applied, some GOBs in the 299 bitstream may be larger than the size of one packet. Similarly, when 300 ISD option is applied, a picture segment may be larger than the required 301 packet size. The remaining fragment of a picture segment larger than 302 the required packet size is termed "follow-on" packet in this document. 304 These follow-on packets with data fragmented at the macroblock 305 boundaries are not independently recoverable. In this case, the payload 306 header includes only the first 32-bit word and PLEN shall be set to 0. 307 A receiver should discard any follow-on packet it receives if the 308 preceding packet containing the segment header information has been 309 lost. 311 Here is an example of a follow-on packet: 313 0 1 2 3 314 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 315 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 316 |0 0|SBIT |EBIT |0 0 0 0 0|0 0 0| TID | Trun | RR | 317 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 318 | sub-segment bitstream data ... . 319 +---------------------------------------------------------------+ 321 Even though they may have identical payload headers, a follow-on packet 322 can be differentiated from the first packet in a frame since the data in 323 a follow-on packet does not begin with a PSC. 325 5. Security Considerations 327 RTP packets using the payload format defined in this specification are 328 subject to the security considerations discussed in the RTP 329 specification [1], and any appropriate RTP profile (for example [3]). 330 This implies that confidentiality of the media streams is achieved by 331 encryption. Because the data compression used with this payload format 332 is applied end-to-end, encryption may be performed after compression so 333 there is no conflict between the two operations. 335 A potential denial-of-service threat exists for data encodings using 336 compression techniques that have non-uniform receiver-end computational 337 load. The attacker can inject pathological datagrams into the stream 338 which are complex to decode and cause the receiver to be overloaded. 339 However, this encoding does not exhibit any significant non-uniformity. 341 As with any IP-based protocol, in some circumstances a receiver may be 342 overloaded simply by the receipt of too many packets, either desired or 343 undesired. Network-layer authentication may be used to discard packets 344 from undesired sources, but the processing cost of the authentication 345 itself may be too high. In a multicast environment, pruning of specific 346 sources may be implemented in future versions of IGMP [5] and in 347 multicast routing protocols to allow a receiver to select which sources 348 are allowed to reach it. 350 A security review of this payload format found no additional 351 considerations beyond those in the RTP specification. 353 6. References 355 [1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP : A 356 Transport Protocol for Real-Time Applications", RFC 1889. 358 [2] "Video Codec for Audiovisual Services at px64 kbits/s", ITU-T 359 Recommendation H.261, 1993. 361 [3] "RTP Profile for Audio and Video Conference with Minimal Control", 362 RFC 1890. 364 [4] "Video Coding for Low Bitrate Communication", Draft ITU-T 365 Recommendation H.263, Draft 20, September 1997. 367 [5] T. Turletti, C. Huitema, "RTP Payload Format for H.261 Video 368 Streams", RFC 2032. 370 [6] C. Zhu, "RTP Payload Format for H.263 Video Streams", RFC 2190.