[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[AVT] draft-ietf-avt-uncomp-video-00.txt
Hi, all,
Here is a marked-up version of draft-ietf-avt-uncomp-video-00.txt.
I won't be at the AVT meetings, but have a good time! :-)
Cheers,
Chuck Harrison
Far Field Associates, LLC
+1 360 863 8340 (voice) PST = GMT-0800
Internet Engineering Task Force AVT WG
INTERNET-DRAFT Ladan Gharai
draft-ietf-avt-uncomp-video-00.txt Colin Perkins
USC/ISI
17 October 2002
Expires: April 2003
RTP Payload Format for Uncompressed Video
Changes marked up 28-Oct-02 Chuck Harrison
Most changes indicated with {{{ ... }}}
Status of this Memo
[...]
Abstract
This memo specifies a packetization scheme for encapsulating
uncompressed HDTV as defined by SMPTE 274M and SMPTE 296M into
a payload format for the Real-Time Transport Protocol (RTP).
SMPTE 274M and SMPTE 296M define the analog and digital
representation of HDTV with image formats of 1920x1080 and
1280x720, respectively. The payload has been designed such
that it may scale to future higher resolutions, such as
Digital Cinema.
1. Introduction
This memo defines a scheme to packetize uncompressed, studio-quality,
video streams for transport using RTP [RTP]. It supports a range of
standard and high definition video formats, including ITU-R BT.601
[601], SMPTE 274M [274] and SMPTE 296M [296].
[...]
Although these formats differ in their details, they are structurally
very similar. This memo specifies a payload format to encapsulate these,
and other similar, video formats for transport within RTP.
[...]
3. Payload Design
Each scan line of digital video is packetized into one or more
(depending on the current MTU) RTP packets. A single RTP packet MAY also
contain data for more than one scan line. Only the active samples are
included in the RTP payload, inactive samples and the contents of
horizontal and vertical blanking SHOULD NOT be transported. Scan line
numbers are included in the RTP payload header, along with a field
identifier for interlaced video.
For SMPTE 296M format video, valid scan line numbers are from 26
through 745, inclusive. For progressive scan SMPTE 274M format
video, valid scan lines are from scan line 42 through 1121
inclusive. For interlaced scan, valid scan line numbers for field
one (F=0) are from 21 to 560 and valid scan line numbers for the
second field (F=1) are from 584 to 1123. For ITU-R BT.601 format
video, the blanking intervals defined in BT.656 are used: for 625
line video, lines 24 to 310 of field one (F=0) and 337 to 623 of
the second field (F=1) are valid; for 525 line video, lines 21 to
263 of the first field, and 284 to 525 of the second field are
valid. Other formats (e.g. [372]) may define different ranges of
active lines.
Sample values for pixels may be transfered as 8 bit or 10 bit values.
{{{ In extended formats 12 bit and 16 bit values are also supported.
For 10 bit and 12 bit payloads,}}} care must be taken such that the
payload is also octet aligned.
However, for video content it is desirable for the video to be both
octet aligned when packetized and also adhere to the principles of
application level framing [ALF]. For YCrCb video, the ALF principle
translates into not fragmenting related luminance and chrominance values
across packets. For example, with 4:2:0 color subsampling each group of
4 pixels is represented by 6 values, Y1 Y2 Y3 Y4 Cr Cb, and video
content should be packetized such that these values are not fragmented
across a packet boundary. With 10 bit words this is a 60 bit value which
is not octet aligned. To be both octet aligned, and appropriately
framed, pixels must be framed in 2 groups of 4, thereby becoming octet
aligned on a 15 octet boundary. This length is referred to as the pixel
group ("pgroup"), and it is conveyed in the SDP parameters. Tables 1 and
2 display the pgroup value for 4:2:2 and 4:4:4 color samplings, for 10
bit and 8 bit words.
{{{
10 bit words
Color --------------------------------
Subsampling Pixels #words octet alignment pgroup
+-----------+------+ +------+---------------+-------+
|monochrome | 4 | | 4x10 | 40/8 = 5 | 5 |
+-----------+------+ +------+---------------+-------+
| 4:2:0 | 4 | | 6x10 | 2x60/8 = 15 | 15 |
+-----------+------+ +------+---------------+-------+
| 4:2:2 | 2 | | 4x10 | 40/8 = 5 | 5 |
+-----------+------+ +------+---------------+-------+
| 4:4:4 | 1 | | 3x10 | 4x30/8 = 15 | 15 |
+-----------+------+ +------+---------------+-------+
| 4:4:4:4 | 1 | | 4x10 | 40/8 = 5 | 5 |
+-----------+------+ +------+---------------+-------+
Table 1: pgroup values for 10 bit sampling
8 bit words
Color --------------------------------
Subsampling Pixels #words octet alignment pgroup
+-----------+------+ +------+---------------+-------+
|monochrome | 1 | | 1x8 | 8/8 = 1 | 1 |
+-----------+------+ +------+---------------+-------+
| 4:2:0 | 4 | | 6x8 | 6x8/8 = 6 | 6 |
+-----------+------+ +------+---------------+-------+
| 4:2:2 | 2 | | 4x8 | 4x8/8 = 8 | 4 |
+-----------+------+ +------+---------------+-------+
| 4:4:4 | 1 | | 3x8 | 3x8/8 = 3 | 3 |
+-----------+------+ +------+---------------+-------+
| 4:4:4:4 | 1 | | 4x8 | 4x8/8 = 4 | 4 |
+-----------+------+ +------+---------------+-------+
Table 2: pgroup values for 8 bit sampling
12 bit words
Color --------------------------------
Subsampling Pixels #words octet alignment pgroup
+-----------+------+ +------+---------------+-------+
|monochrome | 2 | | 2x12 | 2x12/8 = 3 | 3 |
+-----------+------+ +------+---------------+-------+
| 4:2:0 | 4 | | 6x12 | 72/8 = 9 | 9 |
+-----------+------+ +------+---------------+-------+
| 4:2:2 | 2 | | 4x12 | 48/8 = 6 | 6 |
+-----------+------+ +------+---------------+-------+
| 4:4:4 | 2 | | 6x12 | 2x36/8 = 9 | 9 |
+-----------+------+ +------+---------------+-------+
| 4:4:4:4 | 1 | | 4x12 | 48/8 = 6 | 6 |
+-----------+------+ +------+---------------+-------+
Table 3: pgroup values for 12 bit sampling
16 bit words
Color --------------------------------
Subsampling Pixels #words octet alignment pgroup
+-----------+------+ +------+---------------+-------+
|monochrome | 1 | | 1x16 | 16/8 = 2 | 2 |
+-----------+------+ +------+---------------+-------+
| 4:2:0 | 4 | | 6x16 | 6x16/8 = 12 | 12 |
+-----------+------+ +------+---------------+-------+
| 4:2:2 | 2 | | 4x16 | 4x16/8 = 8 | 8 |
+-----------+------+ +------+---------------+-------+
| 4:4:4 | 1 | | 3x16 | 3x16/8 = 6 | 6 |
+-----------+------+ +------+---------------+-------+
| 4:4:4:4 | 1 | | 4x16 | 4x16/8 = 8 | 8 |
+-----------+------+ +------+---------------+-------+
Table 4: pgroup values for 16 bit sampling
}}}
When packetizing digital active line content, video data MUST NOT be
fragmented within a pgroup.
{{{
Video content is almost always associated with additional information such as audio tracks, time code, etc. In professional digital video applications this data is commonly embedded in non-video portions of the data stream (horizontal and vertical blanking periods) so that precise and robust synchronization is maintained. This payload format envisions that applications requiring such synchronized ancillary data should deliver it in separate RTP sessions which operate concurrently with the video session. In order to maintain robust synchronization in the face of equipment which changes RTP timestamps, and to support off-speed, reverse, or other "trick" playout, this payload allows the sender to insert 32-bit frame numbers into the stream. [Note: applications which require embedded ancillary data should consider an alternative payload format such as [292RTP].)
}}}
4. RTP Packetization
The standard RTP header is followed by a payload header {{{containing a
4 octet frame number,}}} and an 8 octet header for each
line (or partial line) of video included. One or more lines, or partial
lines, of payload data follow. For example, if two lines of video are
encapsulated, the payload format will be as shown in Figure 1.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| V |P|X| CC |M| PT | Sequence No |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Stamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SSRC |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
{{{ | Frame number | }}}
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Scan Line No | Scan Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length |F|M| Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Scan Line No | Scan Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length |F|M| Z |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
. .
. Two (partial) lines of video data .
. .
+---------------------------------------------------------------+
Figure 1: RTP Payload Format showing two (partial) lines of video
Gharai/Perkins [Page 4]
INTERNET-DRAFT Expires: April 2003 October 2002
4.1. The RTP Header
The fields of the fixed RTP header have their usual meaning, with the
following additional notes:
Payload Type (PT): 7 bits
A dynamically allocated payload type field which designates the
payload as uncompressed video.
Timestamp: 32 bits
A 90 kHz timestamp MUST be used to denote the sampling instant of
the video frame to which the RTP packet belongs. Packets MUST NOT
include data from multiple frames, and all packets belonging to the
same frame MUST have the same timestamp. {{{ [Consider whether the two
fields of interlaced video MAY have distinct timestamps. In
some ways this is more "natural" for true interlaced video and
distinguishes it from "progressive segmented frame" (PsF) mode
in which the two fields really do refer to the same time instant.] }}}
Marker bit (M): 1 bit
The Marker bit denotes the end of a video frame, and MUST be set to
1 for the last packet of the video frame. It MUST be set to 0 for
other packets.
4.2. Payload Header
{{{
Frame Number : 32 bits
Frame number of the encapsulated data in network byte order.
Successive RTP packets MAY contain parts of the same frame.
A single RTP packet MAY NOT contain parts of two different
frames. The frame number SHOULD increment by one for every video
frame (every two fields, for interlaced formats) in normal
play time sequence. }}}
Scan Line No : 16 bits
Scan line number of encapsulated data in network byte order.
Successive RTP packets MAY contains parts of the same scan line
(with an incremented RTP sequence number, but the same timestamp),
if it is necessary to fragment a line.
Scan Offset : 16 bits
Sample number of the co-sited luminance sample (if YUV format data
is being transported), or the red sample (if RGB format data is
transported) where the scan line is fragmented, in network byte
order. {{{ [Clarify - is this the sample number of the first sample
in the packet? Or could it alternatively identify the last sample
of a packet? Is the first sample number of a line 0 or 1? What
is the entry in a header describing a non-fragmented line? The
answers may be obvious to you but spell them out anyway.] }}}
Length: 16 bits
Number of octets of data included. This MUST be a multiple of the
pgroup value.
Field Identification (F): 1 bit
Identifies which field the scan line belongs to, for interlaced
data. F=0 identifies the the first field and F=1 the second field.
For progressive data (SMPTE 296M) F MUST always be set to zero.
Follow On (more lines) bit (M): 1 bit
Determines if an additional payload header follows the current
header in the RTP packet. Set to 1 if an additional header follows,
implying that the RTP packet is carrying data for more than one
scan line. Set to 0 otherwise.
Reserved (Z): 14 bits
These bits SHOULD be set to zero by the sender and MUST be ignored
by receivers.
4.3. Payload Data
Depending on the video format, each RTP packet can include either a
single complete scan line, a single fragment of a scan line, or one (or
more) complete scan lines plus a fragment of a scan line. {{{ Every scan
line or scan line fragment MUST begin at an octet boundary in the payload
data. }}}
If the video is in YUV format, the packing of samples into the payload
depends on the color sub-sampling used. For RGB format video, there is a
single packing scheme.
For RGB format video, samples are packed in order Red-Green-Blue. {{{All
samples are the same bit size, which may be 8, 10, 12, or 16 bits. }}}
If 8 bit samples are used, the pgroup is 3 octets. If 10 bit samples
are used, samples from adjacent pixels are packed with no padding,
and the pgroup is 15 octets (4 pixels). Refer to Tables 1 thru 4.
{{{
For RGBA format video, samples are packed in order Red-Green-Blue-Alpha.
All samples are the same bit size, which may be 8, 10, 12, or 16 bits.
Refer to Tables 1 thru 4. }}}
For YUV 4:4:4 format video, samples are packed in order Cb-Y-Cr. Each
sample is either an 8 bit or a 10 bit value. If 8 bit samples are used,
the pgroup is 3 octets. If 10 bit samples are used, samples from
adjacent pixels are packed with no padding, and the pgroup is 15 octets
(4 pixels).
For YUV 4:2:2 format video, the Cb and Cr components are horizontally
sub-sampled by a factor of two (each Cb and Cr samples corresponds to
two Y components). Samples are packed in order Cb0-Y0-Cr0-Y1. If 8 bit
samples are used, the pgroup is 4 octets. If 10 bit samples are used,
the pgroup is 5 octets.
(tbd: YUV 4:2:0 format video)
{{{
It is possible that the scan line length is not evenly divisible by
the number of pixels in a pgroup, so the final pixel data of a scan
line does not align to either an octet or pgroup boundary. Nonetheless
the payload MUST contain a whole number of pgroups; the sender SHOULD
fill the remaining bits of the final pgroup with zero and the receiver
SHOULD ignore the fill data. (In effect, the trailing edge of the image
is black-filled to a pgroup boundary.)
}}}
5. Required Parameters
(tbd)
Parameters are: color mode (RGB/YUV), color sub-sampling
(4:4:4, 4:2:2, 4:2:0), lines per frame, pixels per line,
bits per sample, and
scan mode (progressive or interlaced). Propose to map these to
SDP a=fmtp: values.
{{{
Optional parameters are: colorimetry (primaries, whitepoint,
reference medium), transfer function (log, gamma, toe
treatment, black offset), image orientation, capture temporal
mode (field integration, frame integration, spot scan,
pushbroom scan). [286], [22028]
}}}
6. RTCP Considerations
[...]
7. IANA Considerations
[...]
9. Security Considerations
[...]
10. Relation to RFC 2431
(tbd) [BT656]
11. Full Copyright Statement
Copyright (C) The Internet Society (2002). All Rights Reserved.
[...]
12. Authors' Addresses
[...]
Bibliography
[...]
{{{
[268] Society of Motion Picture and Television Engineers,
File Format for Digital Moving Picture Exchange (DPX),
SMPTE 268M-1994. (Currently under revision.)
[372] Society of Motion Picture and Television Engineers,
Dual Link 292M Interface for 1920 x 1080 Picture Raster,
SMPTE 372M-2002.
[292RTP] L. Gharai et al., "RTP Payload Format for SMPTE 292M Video",
Internet Draft, draft-ietf-avt-smpte292-video-07.txt,
Work in progress.
[22028] ISO TC42 (Photography), Photography and graphic technology -
Extended colour encodings for digital image storage,
manipulation and interchange - Part 1: Architecture and
requirements, ISO/CD 22028-1, Work in Progress.
}}}
Gharai/Perkins [Page 11]