[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [AVT] I-D ACTION:draft-ietf-avt-uncomp-video-02.txt
Hi all,
I concur with chuck, this is looking very good indeed.
Just a few remarks
*****************************
I am not sure I always understand the "pixels" and "#words" second and third column in the tables ?
For exemple for monochrome 10 bits pixels is "4" so that would hint it means "number of pixels per Pgroup"
But then the next line is "4" for 4:1:1 while there are 8 pixels per Pgroup?
Also the #words is 6x10, but these are actually "bits" rather than "words"?
I think these tables would benefit a few sentences giving the exact definition of each column.
*****************************
The 8 bit words tables probably contains a typo for the "octet alignement" of Interlaced 4:2:0 which should be "4x8/8 = 4"
no?
*****************************
in section 4.3 it says
"For RGB format video, samples are packed in order Red-Green-Blue. All
samples are the same bit size, which may be 8, 10, 12, or 16 bits. If 8
bit samples are used, the pgroup is 3 octets. If 10 bit samples are
used, samples from adjacent pixels are packed with no padding, and the
pgroup is 15 octets (4 pixels). Refer to Tables 1 thru 4.
For RGBA format video, samples are packed in order Red-Green-Blue-Alpha.
All samples are the same bit size, which may be 8, 10, 12, or 16 bits.
For pgroups refer to Tables 1 thru 4."
Unfortunately the tables 1 thru 4 do not explicitely mention any RGB or RGBA format(!)
I think it would be useful to add them, here are the tables
(not sure the notation R:G:B is appropriate, anyway "RGB" is
what is declared in the mime type ...)
8 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+---------------+
| R:G:B | 1 | P | | 3x8 | 3x8/8 = 3 | 3 | 3 |
+-----------+------+---+ +------+---------------+---------------+
| R:G:B:A | 1 | P | | 4x8 | 4x8/8 = 4 | 4 | 4 |
+-----------+------+---+ +------+---------------+---------------+
10 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+---------------+
| R:G:B | 1 | P | | 3x10 | 4x30/8 = 15 | 12 | 15 |
+-----------+------+---+ +------+---------------+---------------+
| R:G:B:A | 1 | P | | 4x10 | 40/8 = 5 | 4 | 5 |
+-----------+------+---+ +------+---------------+---------------+
12 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+---------------+
| R:G:B | 1 | P | | 6x12 | 2x36/8 = 9 | 6 | 9 |
+-----------+------+---+ +------+---------------+---------------+
| R:G:B:A | 1 | P | | 4x12 | 48/8 = 6 | 4 | 6 |
+-----------+------+---+ +------+---------------+---------------+
16 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+-------+-------+
| R:G:B | 1 | P | | 3x16 | 3x16/8 = 6 | 3 | 6 |
+-----------+------+---+ +------+---------------+-------+-------+
| R:G:B:A | 1 | P | | 4x16 | 4x16/8 = 8 | 4 | 8 |
+-----------+------+---+ +------+---------------+-------+-------+
*****************************
Since we have 4:4:4:4, one thing that should be added is 4:4:2:0
which is 4:2:0 (the most common raw video format
since it is the raw format of most codecs) plus
a transparency plane with one transparency word
per pixel i.e. same amount of transparency as luminance
This can be *very* useful for studio applications and is also
directly compressable using the profiles and levels of
MPEG-4 video that support the "shape" tool.
It simply needs to add 4:4:2:0 in the valid value list (section 6.1)
and in section 3 the relevant tables would be:
8 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | P | | 10x8 | 80/8 = 10 | 10 | 10 |
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | I | | 8x8 | 64/8 = 8 | 8 | 8 |
+-----------+------+---+ +------+---------------+---------------+
10 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | P | | 10x10| 2x100/8 = 25 | 20 | 25 |
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | I | | 8x10 | 80/8 = 10 | 8 | 10 |
+-----------+------+---+ +------+---------------+---------------+
12 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | P | | 10x12| 120/8 = 15 | 10 | 15 |
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | I | | 8x12 | 96/8 = 12 | 8 | 12 |
+-----------+------+---+ +------+---------------+---------------+
16 bit words
Color --------------------------------------
Subsampling Pixels #words octet alignment samples pgroup
octets
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | P | | 10x16| 160/8 = 20 | 10 | 20 |
+-----------+------+---+ +------+---------------+---------------+
| 4:4:2:0 | 4 | I | | 8x16 | 128/8 = 16 | 8 | 16 |
+-----------+------+---+ +------+---------------+---------------+
regards,
Philippe Gentric
Software Architect
Philips MP4Net
philippe.gentric@philips.com
http://www.platform4.philips.com
To: avt@ietf.org
Ladan Gharai <ladan@east.isi.edu>
cc: (bcc: Philippe Gentric/MP4-SUR/CE/PHILIPS)
Subject: Re: [AVT] I-D ACTION:draft-ietf-avt-uncomp-video-02.txt
Chuck Harrison
<cfharr@erols.com> Classification:
Sent by:
avt-admin@ietf.org
05/03/2003 19:19
Hi, all,
A few comments on this draft, which is looking very good so far.
---
In uncomp-video-02 we have:
4.3. Payload Data
Depending on the video format, each RTP packet can include either a
single complete scan line, a single fragment of a scan line, or one
(or
more) complete scan lines plus a fragment of a scan line. [...]
I think there are additional logical possibilities allowed by
the document as it stands: a packet may begin with the tail
fragment of a line, then continue with 0 or more full lines, and
possibly the head fragment of another line.
If the intent is to exclude these possibilities, we need normative
language to say so. If these operational modes are permitted, the
quoted text above is misleading and should be amended.
---
On page 3 of the draft, we have
[...] This payload format
requires that applications using such synchronized ancillary data MUST
deliver it in separate RTP sessions which operate concurrently with
the
video session. The normal RTP mechanisms SHOULD be used to
synchronize
the media.
I believe the MUST is inappropriate and SHOULD is preferable.
The idea that frame-accurate timecode will be carried as a
parallel RTP session is a laudable but forward-looking conjecture.
IMHO reliable use of this payload format in the near future
will depend on carriage of vertical interval timecode (VITC)
in an unused scan line which is part of the vertical blanking
interval. (This is normal video operational practice, as
mentioned in RFC 2431.)
At the top of page 2 of the draft, we have "...the contents of
horizontal and vertical blanking SHOULD NOT be transported." This
does not bother me as SHOULD NOT language does not prevent me from
carrying VITC when I need it, until the multisession RTP sync
issues get ironed out. On the other hand the MUST language of
page 3 is burdensome.
---
<nitpick>
A minor MUST/SHOULD nit at sec. 4.3 regarding filling to a pgroup
boundary: since a receiver MUST ignore the fill data, it seems
unneccessary that the sender MUST fill with zeroes, specifically.
The sender MUST fill, and SHOULD fill with zeroes.
</nitpick>
---
The MIME and SDP parameters do not include the video frame rate.
There are many plausible scenarios in which it is important
to distinguish alternative frame rates. For example, choosing
the right VOD stream during RTSP setup would need something like
this.
---
Cheers,
Chuck Harrison
Far Field Associates, LLC
_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt
_______________________________________________
Audio/Video Transport Working Group
avt@ietf.org
https://www1.ietf.org/mailman/listinfo/avt