[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AVT] draft-ietf-avt-uncomp-video-00.txt



Hi, all,

Here is a marked-up version of draft-ietf-avt-uncomp-video-00.txt.
I won't be at the AVT meetings, but have a good time! :-)

Cheers,
  Chuck Harrison
  Far Field Associates, LLC
  +1 360 863 8340 (voice) PST = GMT-0800
Internet Engineering Task Force                                   AVT WG
INTERNET-DRAFT                                              Ladan Gharai
draft-ietf-avt-uncomp-video-00.txt                         Colin Perkins
                                                                 USC/ISI
                                                         17 October 2002
                                                     Expires: April 2003


                RTP Payload Format for Uncompressed Video
Changes marked up 28-Oct-02 Chuck Harrison
Most changes indicated with {{{ ... }}}

Status of this Memo

[...]


                                Abstract


     This memo specifies a packetization scheme for encapsulating
     uncompressed HDTV as defined by SMPTE 274M and SMPTE 296M into
     a payload format for  the Real-Time Transport Protocol (RTP).
     SMPTE 274M  and SMPTE 296M  define the analog and digital
     representation of HDTV with image formats of 1920x1080  and
     1280x720, respectively. The payload has been designed such
     that it may scale to future higher resolutions, such as
     Digital Cinema.



1.  Introduction

This memo defines a scheme to packetize uncompressed, studio-quality,
video streams for transport using RTP [RTP]. It supports a range of
standard and high definition video formats, including ITU-R BT.601
[601], SMPTE 274M [274] and SMPTE 296M [296].

[...]
Although these formats differ in their details, they are structurally
very similar. This memo specifies a payload format to encapsulate these,
and other similar, video formats for transport within RTP.


[...]


3.  Payload Design

Each scan line of digital video is packetized into one or more
(depending on the current MTU) RTP packets. A single RTP packet MAY also
contain data for more than one scan line. Only the active samples are
included in the RTP payload, inactive samples and the contents of
horizontal and vertical blanking SHOULD NOT be transported. Scan line
numbers are included in the RTP payload header, along with a field
identifier for interlaced video.


     For SMPTE 296M format video, valid scan line numbers are from 26
     through 745, inclusive. For progressive scan SMPTE 274M format
     video, valid scan lines are from scan line 42 through 1121
     inclusive. For interlaced scan, valid scan line numbers for field
     one (F=0) are from 21 to 560 and valid scan line numbers for the
     second field (F=1) are from 584 to 1123. For ITU-R BT.601 format
     video, the blanking intervals defined in BT.656 are used: for 625
     line video, lines 24 to 310 of field one (F=0) and 337 to 623 of
     the second field (F=1) are valid; for 525 line video, lines 21 to
     263 of the first field, and 284 to 525 of the second field are
     valid.  Other formats (e.g. [372]) may define different ranges of
     active lines.


Sample values for pixels may be transfered as 8 bit or 10 bit values.
{{{ In extended formats 12 bit and 16 bit values are also supported.
For 10 bit and 12 bit payloads,}}} care must be taken such that the
payload is also octet aligned.

However, for video content it is desirable for the video to be both
octet aligned when packetized and also adhere to the principles of
application level framing [ALF]. For YCrCb video, the ALF principle
translates into not fragmenting related luminance and chrominance values
across packets. For example, with 4:2:0 color subsampling each group of
4 pixels is represented by 6 values, Y1 Y2 Y3 Y4 Cr Cb, and video
content should be packetized such that these values are not fragmented
across a packet boundary. With 10 bit words this is a 60 bit value which
is not octet aligned. To be both octet aligned, and appropriately
framed, pixels must be framed in 2 groups of 4, thereby becoming octet
aligned on a 15 octet boundary. This length is referred to as the pixel
group ("pgroup"), and it is conveyed in the SDP parameters. Tables 1 and
2 display the pgroup value for 4:2:2 and 4:4:4 color samplings, for 10
bit and 8 bit words.

{{{
                                  10 bit words
          Color            --------------------------------
       Subsampling Pixels  #words  octet alignment  pgroup
      +-----------+------+ +------+---------------+-------+
      |monochrome |  4   | | 4x10 |    40/8 = 5   |   5   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:0   |  4   | | 6x10 |  2x60/8 = 15  |  15   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:2   |  2   | | 4x10 |    40/8 = 5   |   5   |
      +-----------+------+ +------+---------------+-------+
      |   4:4:4   |  1   | | 3x10 |  4x30/8 = 15  |  15   |
      +-----------+------+ +------+---------------+-------+
      |  4:4:4:4  |  1   | | 4x10 |    40/8 = 5   |   5   |
      +-----------+------+ +------+---------------+-------+
     Table 1: pgroup values for 10 bit sampling



                                   8 bit words
          Color            --------------------------------
       Subsampling Pixels  #words  octet alignment  pgroup
      +-----------+------+ +------+---------------+-------+
      |monochrome |  1   | | 1x8  |    8/8 = 1    |   1   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:0   |  4   | | 6x8  |  6x8/8 = 6    |   6   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:2   |  2   | | 4x8  |  4x8/8 = 8    |   4   |
      +-----------+------+ +------+---------------+-------+
      |   4:4:4   |  1   | | 3x8  |  3x8/8 = 3    |   3   |
      +-----------+------+ +------+---------------+-------+
      |  4:4:4:4  |  1   | | 4x8  |  4x8/8 = 4    |   4   |
      +-----------+------+ +------+---------------+-------+
     Table 2: pgroup values for 8 bit sampling

                                  12 bit words
          Color            --------------------------------
       Subsampling Pixels  #words  octet alignment  pgroup
      +-----------+------+ +------+---------------+-------+
      |monochrome |  2   | | 2x12 |  2x12/8 = 3   |   3   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:0   |  4   | | 6x12 |    72/8 = 9   |   9   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:2   |  2   | | 4x12 |    48/8 = 6   |   6   |
      +-----------+------+ +------+---------------+-------+
      |   4:4:4   |  2   | | 6x12 |  2x36/8 = 9   |   9   |
      +-----------+------+ +------+---------------+-------+
      |  4:4:4:4  |  1   | | 4x12 |    48/8 = 6   |   6   |
      +-----------+------+ +------+---------------+-------+
     Table 3: pgroup values for 12 bit sampling

                                   16 bit words
          Color            --------------------------------
       Subsampling Pixels  #words  octet alignment  pgroup
      +-----------+------+ +------+---------------+-------+
      |monochrome |  1   | | 1x16 |   16/8 = 2    |   2   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:0   |  4   | | 6x16 | 6x16/8 = 12   |  12   |
      +-----------+------+ +------+---------------+-------+
      |   4:2:2   |  2   | | 4x16 | 4x16/8 = 8    |   8   |
      +-----------+------+ +------+---------------+-------+
      |   4:4:4   |  1   | | 3x16 | 3x16/8 = 6    |   6   |
      +-----------+------+ +------+---------------+-------+
      |  4:4:4:4  |  1   | | 4x16 | 4x16/8 = 8    |   8   |
      +-----------+------+ +------+---------------+-------+
     Table 4: pgroup values for 16 bit sampling

}}}

When packetizing digital active line content, video data MUST NOT be
fragmented within a pgroup.

{{{
Video content is almost always associated with additional information such as audio tracks, time code, etc. In professional digital video applications this data is commonly embedded in non-video portions of the data stream (horizontal and vertical blanking periods) so that precise and robust synchronization is maintained. This payload format envisions that applications requiring such synchronized ancillary data should deliver it in separate RTP sessions which operate concurrently with the video session. In order to maintain robust synchronization in the face of equipment which changes RTP timestamps, and to support off-speed, reverse, or other "trick" playout, this payload allows the sender to insert 32-bit frame numbers into the stream. [Note: applications which require embedded ancillary data should consider an alternative payload format such as [292RTP].)
}}}

4.  RTP Packetization

The standard RTP header is followed by a payload header {{{containing a
 4 octet frame number,}}} and an 8 octet header for each
line (or partial line) of video included. One or more lines, or partial
lines, of payload data follow. For example, if two lines of video are
encapsulated, the payload format will be as shown in Figure 1.


       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      | V |P|X|   CC  |M|    PT       |           Sequence No         |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                           Time Stamp                          |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                             SSRC                              |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
{{{   |                         Frame number                          | }}}
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    Scan Line No               |        Scan Offset            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Length                |F|M|         Z                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |    Scan Line No               |        Scan Offset            |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |         Length                |F|M|         Z                 |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      .                                                               .
      .                 Two (partial) lines of video data             .
      .                                                               .
      +---------------------------------------------------------------+
     Figure 1: RTP Payload Format showing two (partial) lines of video




Gharai/Perkins                                                  [Page 4]

INTERNET-DRAFT             Expires: April 2003              October 2002


4.1.  The RTP Header

The fields of the fixed RTP header have their usual meaning, with the
following additional notes:


Payload Type (PT): 7 bits

     A dynamically allocated payload type field which designates the
     payload as uncompressed video.


Timestamp: 32 bits

     A 90 kHz timestamp MUST be used to denote the sampling instant of
     the video frame to which the RTP packet belongs. Packets MUST NOT
     include data from multiple frames, and all packets belonging to the
     same frame MUST have the same timestamp. {{{ [Consider whether the two
     fields of interlaced video MAY have distinct timestamps. In
     some ways this is more "natural" for true interlaced video and
     distinguishes it from "progressive segmented frame" (PsF) mode
     in which the two fields really do refer to the same time instant.] }}}


Marker bit (M): 1 bit

     The Marker bit denotes the end of a video frame, and MUST be set to
     1 for the last packet of the video frame. It MUST be set to 0 for
     other packets.


4.2.  Payload Header

{{{
Frame Number : 32 bits

     Frame number of the encapsulated data in network byte order.
     Successive RTP packets MAY contain parts of the same frame.
     A single RTP packet MAY NOT contain parts of two different
     frames. The frame number SHOULD increment by one for every video
     frame (every two fields, for interlaced formats) in normal
     play time sequence. }}}

Scan Line No : 16 bits

     Scan line number of encapsulated data in network byte order.
     Successive RTP packets MAY contains parts of the same scan line
     (with an incremented RTP sequence number, but the same timestamp),
     if it is necessary to fragment a line.


Scan Offset : 16 bits

     Sample number of the co-sited luminance sample (if YUV format data
     is being transported), or the red sample (if RGB format data is
     transported) where the scan line is fragmented, in network byte
     order. {{{ [Clarify - is this the sample number of the first sample
     in the packet? Or could it alternatively identify the last sample
     of a packet? Is the first sample number of a line 0 or 1? What
     is the entry in a header describing a non-fragmented line? The
     answers may be obvious to you but spell them out anyway.] }}}


Length: 16 bits

     Number of octets of data included. This MUST be a multiple of the
     pgroup value.


Field Identification (F): 1 bit

     Identifies which field the scan line belongs to, for interlaced
     data.  F=0 identifies the the first field and F=1 the second field.
     For progressive data (SMPTE 296M) F MUST always be set to zero.


Follow On (more lines) bit (M): 1 bit

     Determines if an additional payload header follows the current
     header in the RTP packet. Set to 1 if an additional header follows,
     implying that the RTP packet is carrying data for more than one
     scan line. Set to 0 otherwise.


Reserved (Z): 14 bits

     These bits SHOULD be set to zero by the sender and MUST be ignored
     by receivers.


4.3.  Payload Data

Depending on the video format, each RTP packet can include either a
single complete scan line, a single fragment of a scan line, or one (or
more) complete scan lines plus a fragment of a scan line. {{{ Every scan
line or scan line fragment MUST begin at an octet boundary in the payload
data. }}}

If the video is in YUV format, the packing of samples into the payload
depends on the color sub-sampling used. For RGB format video, there is a
single packing scheme.

For RGB format video, samples are packed in order Red-Green-Blue. {{{All
samples are the same bit size, which may be 8, 10, 12, or 16 bits. }}}
If 8 bit samples are used, the pgroup is 3 octets. If 10 bit samples
are used, samples from adjacent pixels are packed with no padding,
and the pgroup is 15 octets (4 pixels). Refer to Tables 1 thru 4.

{{{
For RGBA format video, samples are packed in order Red-Green-Blue-Alpha.
All samples are the same bit size, which may be 8, 10, 12, or 16 bits.
Refer to Tables 1 thru 4. }}}


For YUV 4:4:4 format video, samples are packed in order Cb-Y-Cr. Each
sample is either an 8 bit or a 10 bit value.  If 8 bit samples are used,
the pgroup is 3 octets. If 10 bit samples are used, samples from
adjacent pixels are packed with no padding, and the pgroup is 15 octets
(4 pixels).

For YUV 4:2:2 format video, the Cb and Cr components are horizontally
sub-sampled by a factor of two (each Cb and Cr samples corresponds to
two Y components). Samples are packed in order Cb0-Y0-Cr0-Y1. If 8 bit
samples are used, the pgroup is 4 octets. If 10 bit samples are used,
the pgroup is 5 octets.

(tbd: YUV 4:2:0 format video)

{{{
It is possible that the scan line length is not evenly divisible by
the number of pixels in a pgroup, so the final pixel data of a scan
line does not align to either an octet or pgroup boundary. Nonetheless
the payload MUST contain a whole number of pgroups; the sender SHOULD
fill the remaining bits of the final pgroup with zero and the receiver
SHOULD ignore the fill data. (In effect, the trailing edge of the image
is black-filled to a pgroup boundary.)
}}}

5.  Required Parameters

(tbd)


     Parameters are: color mode (RGB/YUV), color sub-sampling
     (4:4:4, 4:2:2, 4:2:0), lines per frame, pixels per line,
     bits per sample, and
     scan mode (progressive or interlaced). Propose to map these to
     SDP a=fmtp: values.
{{{
     Optional parameters are: colorimetry (primaries, whitepoint,
     reference medium), transfer function (log, gamma, toe
     treatment, black offset), image orientation, capture temporal
     mode (field integration, frame integration, spot scan,
     pushbroom scan). [286], [22028]
}}}     


6.  RTCP Considerations

[...]

7.  IANA Considerations

[...]

9.  Security Considerations

[...]

10.  Relation to RFC 2431


(tbd) [BT656]


11.  Full Copyright Statement

Copyright (C) The Internet Society (2002). All Rights Reserved.

[...]

12.  Authors' Addresses

[...]


Bibliography

[...]
{{{

[268]   Society of Motion Picture and Television Engineers,
        File Format for Digital Moving Picture Exchange (DPX),
        SMPTE 268M-1994. (Currently under revision.)

[372]   Society of Motion Picture and Television Engineers,
        Dual Link 292M Interface for 1920 x 1080 Picture Raster,
        SMPTE 372M-2002.


[292RTP] L. Gharai et al., "RTP Payload Format for SMPTE 292M Video",
        Internet Draft, draft-ietf-avt-smpte292-video-07.txt,
        Work in progress.


[22028] ISO TC42 (Photography), Photography and graphic technology -
        Extended colour encodings for digital image storage,
        manipulation and interchange - Part 1: Architecture and
        requirements, ISO/CD 22028-1, Work in Progress.

}}}


Gharai/Perkins                                                 [Page 11]