idnits 2.17.1 

draft-ietf-avt-rtp-h263-video-00.txt:

  Checking boilerplate required by RFC 5378 and the IETF Trust (see
  https://trustee.ietf.org/license-info):
  ----------------------------------------------------------------------------

  ** Cannot find the required boilerplate sections (Copyright, IPR, etc.) in
     this document.

     Expected boilerplate is as follows today (2024-04-25) according to
     https://trustee.ietf.org/license-info :

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.a:
        This Internet-Draft is submitted in full conformance with the provisions
        of BCP 78 and BCP 79.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 2:
        Copyright (c) 2024 IETF Trust and the persons identified as the document
        authors.  All rights reserved.

     IETF Trust Legal Provisions of 28-dec-2009, Section 6.b(i), paragraph 3:
        This document is subject to BCP 78 and the IETF Trust's Legal Provisions
        Relating to IETF Documents
        (https://trustee.ietf.org/license-info) in effect on the date of
        publication of this document.  Please review these documents
        carefully, as they describe your rights and restrictions with
        respect to this document.  Code Components extracted from this
        document must include Simplified BSD License text as described in
        Section 4.e of the Trust Legal Provisions and are provided
        without warranty as described in the Simplified BSD License.


  Checking nits according to https://www.ietf.org/id-info/1id-guidelines.txt:
  ----------------------------------------------------------------------------

  ** Missing expiration date.  The document expiration date should appear on
     the first and last page.

  ** The document seems to lack a 1id_guidelines paragraph about
     Internet-Drafts being working documents. 

  ** The document seems to lack a 1id_guidelines paragraph about 6 months
     document validity -- however, there's a paragraph with a matching
     beginning. Boilerplate error?

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     current Internet-Drafts. 

  ** The document seems to lack a 1id_guidelines paragraph about the list of
     Shadow Directories. 

  ** Expected the document's filename to be given on the first page, but
     didn't find any

  == No 'Intended status' indicated for this document; assuming Proposed
     Standard

  == The page length should not exceed 58 lines per page, but there was 1
     longer page, the longest (page 1) being 387 lines


  Checking nits according to https://www.ietf.org/id-info/checklist :
  ----------------------------------------------------------------------------

  ** The document seems to lack an Abstract section.

  ** The document seems to lack an IANA Considerations section.  (See Section
     2.2 of https://www.ietf.org/id-info/checklist for how to handle the case
     when there are no actions for IANA.)

  ** The document seems to lack an Authors' Addresses Section.

  ** The document seems to lack separate sections for Informative/Normative
     References.  All references will be assumed normative when checking for
     downward references.


  Miscellaneous warnings:
  ----------------------------------------------------------------------------

  == Couldn't figure out when the document was first submitted -- there may
     comments or warnings related to the use of a disclaimer for pre-RFC5378
     work that could not be issued because of this.  Please check the Legal
     Provisions document at https://trustee.ietf.org/license-info to determine
     if you need the pre-RFC5378 disclaimer.

  -- Couldn't find a document date in the document -- date freshness check
     skipped.


  Checking references for intended status: Proposed Standard
  ----------------------------------------------------------------------------

     (See RFCs 3967 and 4897 for information about using normative references
     to lower-maturity documents in RFCs)

  == Unused Reference: '2' is defined on line 358, but no explicit reference
     was found in the text

  ** Obsolete normative reference: RFC 1889 (ref. '1') (Obsoleted by RFC 3550)

  -- Possible downref: Non-RFC (?) normative reference: ref. '2'

  ** Obsolete normative reference: RFC 1890 (ref. '3') (Obsoleted by RFC 3551)

  -- Possible downref: Non-RFC (?) normative reference: ref. '4'

  ** Obsolete normative reference: RFC 2032 (ref. '5') (Obsoleted by RFC 4587)

  ** Downref: Normative reference to an Historic RFC: RFC 2190 (ref. '6')


     Summary: 15 errors (**), 0 flaws (~~), 4 warnings (==), 3 comments (--).

     Run idnits with the --verbose option for more detailed information about
     the items above.

--------------------------------------------------------------------------------

1	Internet Engineering Task Force                 Audio-Video Transport WG
2	INTERNET-DRAFT                                 C. Bormann / Univ. Bremen
3	                                                        L. Cline / Intel
4	                                                      G. Deisher / Intel
5	                                                       T. Gardos / Intel
6	                                                     C. Maciocco / Intel
7	                                                       D. Newell / Intel
8	                                                   J. Ott / Univ. Bremen
9	                                                   S. Wenger / TU Berlin
10	                                                          C. Zhu / Intel

12	               RTP Payload Format for the 1998 Version of
13	                    ITU-T Rec. H.263 Video (H.263+)

15	Status of This Memo

17	This document is an Internet-Draft.  Internet-Drafts are working
18	documents of the Internet Engineering Task Force (IETF), its areas, and
19	its working groups.  Note that other groups may also distribute working
20	documents as Internet-Drafts.

22	Internet-Drafts are draft documents valid for a maximum of six months
23	and may be updated, replaced, or made obsolete by other documents at any
24	time.  It is inappropriate to use Internet-Drafts as reference material
25	or to cite them other than as "work in progress."

27	To learn the current status of any Internet-Draft, please check the
28	"1id-abstracts.txt" listing contained in the Internet-Drafts Shadow
29	Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
30	munnari.oz.au (Pacific Rim), ds.internic.net (US East Coast), or
31	ftp.isi.edu (US West Coast).

33	Distribution of this document is unlimited.

35	1. Introduction

37	This document specifies an RTP payload header format applicable to the
38	transportation of video streams generated based on the 1998 version of
39	ITU-T Recommendation H.263.

41	The 1998 version of ITU-T Recommendation H.263 added numerous coding
42	options to improve codec performance over the 1996 version.  The 1998
43	version is referred to as H.263+ in this document.  Among the new
44	options, the ones with the biggest impact on the RTP payload are the
45	slice structured mode (SS), independent segment decoding mode (ISD), and
46	the scalability mode.  This section summarizes the impact of these new
47	coding options on packetization.  Refer to [4] for more information on
48	coding options.

50	Slice structure was added to H.263+ for three purposes: to provide
51	enhanced error resilience capability, to make the bitstream more
52	amenable to use with an underlying packet transport such as RTP, and to
53	minimize video delay.  The slice structured mode supports fragmentation
54	at macroblock boundaries.

56	When the independent segment decoding option is employed, a video
57	picture frame is broken into segments and encoded in such a way that
58	each segment is independently decodable.  Utilizing ISD in a lossy
59	network environment helps prevent the propagation of errors from one
60	segment of the picture to others.

62	H.263+ also includes bitstream scalability as an optional coding mode.
63	Three kinds of scalability are defined: temporal, signal-to-noise ratio
64	(SNR), and spatial scalability.  Temporal scalability is achieved via
65	the disposable nature of bi-directionally predicted frames, or B-frames.
66	SNR scalability permits refinement of encoded video frames, thereby
67	improving the quality (or SNR).  Spatial scalability is similar to SNR
68	scalability except the refinement layer is twice the size of the base
69	layer in the horizontal dimension, vertical dimension, or both.

71	2. Usage of RTP

73	When transmitting H.263+ video streams over the internet, the output of
74	the encoder can be packetized directly.  All the bits resulting from the
75	bitstream including the fixed length codes and variable length codes
76	will be included in the packet.

78	For H.263+ bitstreams coded with temporal, spatial, or SNR scalability,
79	each layer may be transported to a different network address.  More
80	specifically, each layer may use a unique IP address and port
81	combination.  In addition, temporal relations between layers shall be
82	expressed using the RTP timestamp so that they can be synchronized at
83	the receiving ends in multicast or unicast applications.

85	The H.263+ video streams will be carried as payload data within RTP
86	packets.  A new H.263+ payload header, H.263+ payload header, is defined
87	in section 4.  This section defines the usage of the RTP fixed header
88	and H.263+ video packet structure.

90	2.1 RTP Header Usage

92	Each RTP packet starts with a fixed RTP header.  The following fields of
93	the RTP fixed header are used for H.263+ video streams:

95	Marker bit (M bit): The Marker bit of the RTP header is set to 1 when
96	the current packet carries the end of current frame, and is 0 otherwise.

98	Payload Type (PT): The Payload Type shall specify H.263+ video payload
99	format.  A dynamic payload can be used initially until a static payload
100	type is assigned.

102	Timestamp: The RTP Timestamp encodes the sampling instance of the first
103	video frame contained in the RTP data packet.  The RTP timestamp may be
104	the same on successive packets if a video frame occupies more than one
105	packet.  In a multilayer scenario, all pictures corresponding to the
106	same temporal reference should pertain the same timestamp.  If temporal
107	scalability is used and B-frames are present, the timestamp may not be
108	monotonically increasing in the video stream.  If B-frames are
109	transmitted on a separate layer and address, they must be synchronized
110	properly with the reference frames.  Please refer to the 1998 ITU
111	Recommendation for H.263 [4] for information on required transmission
112	order to a decoder.  For an H.263+ video stream, the RTP timestamp is
113	based on a 90 kHz clock, the same as that of the RTP payload for H.261
114	stream [5].

116	2.2 Video Packet Structure

118	An H.263+ compressed bitstream is carried as a payload within each RTP
119	packet.  For each RTP packet, the RTP header is followed by an H.263+
120	payload header, which is followed by a standard H.263+ compressed
121	bitstream.  The size of the H.263+ payload header is variable depending
122	on the payload involved as detailed in the section 4.  The layout of the
123	RTP H.263+ video packet is shown as:

125	   0                   1                   2                   3
126	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
127	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
128	  |    RTP Header                                               ...
129	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
130	  |    H.263+ Payload Header                                    ...
131	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
132	  |    H.263+ Compressed Data Stream                            ...
133	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

135	3. Design Considerations

137	The goal of this payload format is to specify an efficient way of
138	encapsulating an H.263+ standard compliant bitstream and enhance the
139	resiliency towards packet losses.  Due to the large number of different
140	possible coding schemes in H.263+, a copy of the picture header with
141	configuration information is inserted into the payload header when
142	appropriate.

144	There are a few assumptions and constraints associated with this H.263+
145	payload header design.  The purpose of this section is to point out
146	various design issues and also discuss several coding options provided
147	by H.263+ that may impact the performance of network video.

149	. It is reasonable to assume that no single macroblock will be too large
150	  to fit in a packet.

152	. The optional slice structured mode described in annex K of H.263+ [4]
153	  enables more flexibility for packetization.  Furthermore, packets
154	  based on a slice structure are also inherently more loss resilient.
155	  Similar to a picture segment that begins with a GOB header, the
156	  motion vector predictors in a slice are restricted to reside within
157	  its boundaries.  For these reasons, the use of the slice structured
158	  mode is strongly recommended for network applications.

160	. In non-rectangular slice structured mode, only complete slices should
161	  be included in a packet.  In other words, slices should not be
162	  fragmented across packets.  Optimally, a packet will contain only one
163	  slice.

165	. When the slice structure is not applied, the insertion of a GOB header
166	  in every GOB is recommended to reduce the dependency on motion vector
167	  prediction across GOBs.  See section 3.3 of [6] for more information.

169	. The independently segmented decoding described in annex R of [4] does
170	  not allow any data dependency across slice or GOB boundaries in
171	  reference picture.  It can be utilized to further improve resiliency
172	  in high loss conditions.

174	. If ISD is used in conjunction with the slice structure, the
175	  rectangular slice submode shall be enabled and the dimensions and
176	  quantity of the slices present in a frame shall remain the same
177	  between two intra-coded frames (I-frames).  The ISD segments may be
178	  entirely intra coded from time to time to realize quick error
179	  recovery without adding latency time associated with sending complete
180	  I-frames.

182	. For resiliency, sending a full picture header for every frame is
183	  recommended.  In other words, the sender should always set the
184	  subfield UFEP in PLUSPTYPE to '001' in the video bitstream.

186	. In a multi-layer scenario, each layer can be transmitted to a
187	  different network address.  The configuration of each layer such as
188	  the enhancement layer number (ELNUM), reference layer number (RLNUM),
189	  and scalability type should be determined at the start of the session
190	  and should not change during the course of the session.

192	4. H.263+ Payload Header

194	For H.263+ video streams, each RTP packet carries only one H.263+ video
195	packet.  The H.263+ payload header is always present for each H.263+
196	video packet.  The payload header has variable length.  If a picture
197	header is included in the payload header, the length of the picture
198	header in number of bytes is specified by PLEN.  The minimum length of
199	the payload header is 32 bits, corresponding to PLEN equals 0.

201	The H.263+ payload header is structured as follow:

203	   0                   1                   2                   3
204	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
205	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
206	  |V=0|SBIT |EBIT |  PLEN   |PEBIT| TID | Trun  |       RR        |
207	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
208	  |1 0 0 0 0 0| picture header starting with TR, PTYPE, ...       .
209	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

211	  V: 2 bits
212	  Version number.  Set to '00' for this payload format.
213	  [Ed. Note: The version control will not take effect until a draft has
214	  been formally submitted to the IETF.]

216	  SBIT: 3 bits
217	  Start bit position specifies the number of bits that should be
218	  ignored in the first data byte of the payload.

220	  EBIT: 3 bits
221	  End bit position indicates the number of bits that should be ignored
222	  in the last data byte of the payload.

224	  PLEN: 3 bits
225	  Picture header length in number of bytes.

227	  PEBIT: 3 bits
228	  End bit position indicates the number of bits that should be ignored
229	  in the last byte of the picture header.

231	  TID: 3 bits
232	  Thread id.  Used only in optional video redundancy coding mode (VRC).
233	  See annex N of [4].  All three bits must be set to 0 unless VRC mode
234	  is applied.

236	  Trun: 4 bits
237	  Cyclic packet number.  Used only in optional VRC mode.  These bits
238	  must be set to 0 unless VRC mode is applied.

240	  RR: 9 bits
241	  Reserved bits.

243	Notice that the TID and Trun fields are associated only with the video
244	redundancy coding usage scenario derived from the reference picture
245	selection mode specified in annex N of [4].  The TID and Trun bits must
246	be set to 0 if VRC is not used.  The use of VRC shall be negotiated by
247	external means.

249	4.1 Encapsulating Packet that Begins with PSC

251	Any packet that begins with a picture start code (PSC), i.e. the first
252	packet of a picture frame, shall be encapsulated using only the first
253	32-bit word of the payload header since a picture header is already
254	included in the data bitstream.  In this case, PLEN shall be 0.

256	Here is an example of encapsulating the first packet in a frame:

258	   0                   1                   2                   3
259	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
260	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
261	  |0 0|SBIT |EBIT |0 0 0 0 0|0 0 0| TID | Trun  |       RR        |
262	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
263	  | bitstream data starts with complete picture header ...        .
264	  +---------------------------------------------------------------+

266	4.2 Encapsulating Packet that Begins with GBSC or SSC

268	Any packet that begins with either a GOB start code (GBSC) or a slice
269	start code (SSC) shall include a copy of the picture header in the
270	payload header for resiliency.  PLEN shall be set to specify the length
271	of the included picture header in bytes.  Hence, PLEN > 0.  The end bit
272	position corresponding to the last byte of the picture header data is
273	indicated by PEBIT.  Actual bitstream data shall begin on an 8-bit byte
274	boundary following the payload header.

276	Notice that only the last six bits of the picture start code, '100000',
277	are included in the payload header.  A complete H.263+ picture header
278	with byte aligned picture start code can be conveniently assembled if
279	needed on the receiving end by prepending the sixteen leading '0' bits.

281	Assuming a PLEN of 9, below is an example of a packet that begins with a
282	GBSC or a SSC:

284	   0                   1                   2                   3
285	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
286	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
287	  |0 0|SBIT |EBIT |0 1 0 0 1|PEBIT| TID | Trun  |       RR        |
288	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
289	  |1 0 0 0 0 0| picture header starting with TR, PTYPE, ...       |
290	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
291	  | ...                                                           |
292	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
293	  | ...           | bitstream data begins with GBSC/SCC ...       .
294	  +-+-+-+-+-+-+-+-+-----------------------------------------------+

296	4.3 Encapsulating Follow-On Packet

298	When slice structure coding option is not applied, some GOBs in the
299	bitstream may be larger than the size of one packet.  Similarly, when
300	ISD option is applied, a picture segment may be larger than the required
301	packet size.  The remaining fragment of a picture segment larger than
302	the required packet size is termed "follow-on" packet in this document.

304	These follow-on packets with data fragmented at the macroblock
305	boundaries are not independently recoverable.  In this case, the payload
306	header includes only the first 32-bit word and PLEN shall be set to 0.
307	A receiver should discard any follow-on packet it receives if the
308	preceding packet containing the segment header information has been
309	lost.

311	Here is an example of a follow-on packet:

313	   0                   1                   2                   3
314	   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
315	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
316	  |0 0|SBIT |EBIT |0 0 0 0 0|0 0 0| TID | Trun  |       RR        |
317	  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
318	  | sub-segment bitstream data ...                                .
319	  +---------------------------------------------------------------+

321	Even though they may have identical payload headers, a follow-on packet
322	can be differentiated from the first packet in a frame since the data in
323	a follow-on packet does not begin with a PSC.

325	5. Security Considerations

327	RTP packets using the payload format defined in this specification are
328	subject to the security considerations discussed in the RTP
329	specification [1], and any appropriate RTP profile (for example [3]).
330	This implies that confidentiality of the media streams is achieved by
331	encryption.  Because the data compression used with this payload format
332	is applied end-to-end, encryption may be performed after compression so
333	there is no conflict between the two operations.

335	A potential denial-of-service threat exists for data encodings using
336	compression techniques that have non-uniform receiver-end computational
337	load.  The attacker can inject pathological datagrams into the stream
338	which are complex to decode and cause the receiver to be overloaded.
339	However, this encoding does not exhibit any significant non-uniformity.

341	As with any IP-based protocol, in some circumstances a receiver may be
342	overloaded simply by the receipt of too many packets, either desired or
343	undesired.  Network-layer authentication may be used to discard packets
344	from undesired sources, but the processing cost of the authentication
345	itself may be too high.  In a multicast environment, pruning of specific
346	sources may be implemented in future versions of IGMP [5] and in
347	multicast routing protocols to allow a receiver to select which sources
348	are allowed to reach it.

350	A security review of this payload format found no additional
351	considerations beyond those in the RTP specification.

353	6. References

355	[1] H. Schulzrinne, S. Casner, R. Frederick, V. Jacobson, "RTP : A
356	    Transport Protocol for Real-Time Applications", RFC 1889.

358	[2] "Video Codec for Audiovisual Services at px64 kbits/s", ITU-T
359	    Recommendation H.261, 1993.

361	[3] "RTP Profile for Audio and Video Conference with Minimal Control",
362	    RFC 1890.

364	[4] "Video Coding for Low Bitrate Communication", Draft ITU-T
365	    Recommendation H.263, Draft 20, September 1997.

367	[5] T. Turletti, C. Huitema, "RTP Payload Format for H.261 Video
368	    Streams", RFC 2032.

370	[6] C. Zhu, "RTP Payload Format for H.263 Video Streams", RFC 2190.