Hi.

I have reviewed this document as part of the security directorate's
ongoing effort to review all IETF documents being processed by the
IESG.  These comments were written primarily for the benefit of the
security area directors.  Document editors and WG chairs should treat
these comments just like any other last call comments.

The document defines how SPIRIT IP-MR encoded speech signals can be
transported over RTP. The security considerations seem to be adequate.
However, I am concerned about the C code in the appendix extracting
frame information. The code does not seem to do proper bound checking,
which I think is a problem that needs to be fixed. I understand that
the frame size is an out parameter - still the size of the buffer
passed via pCoded should be available so that proper bound checking
can be performed.

Other than that, I noticed a number of editorial issues, mostly due to
missing articles etc. I am attaching a unified context diff correcting
some of the issues (but note that I stopped making changes at the end
of section 3 - so there is likely more to fix).

/js

-- 
Juergen Schoenwaelder           Jacobs University Bremen gGmbH
Phone: +49 421 200 3587         Campus Ring 1, 28759 Bremen, Germany
Fax:   +49 421 200 3103         <

http://www.jacobs-university.de/

>


--- draft-ietf-avt-rtp-ipmr-12.txt	2010-03-05 14:34:16.000000000 +0100
+++ draft-ietf-avt-rtp-ipmr-12-js.txt	2010-03-05 14:51:08.000000000 +0100
@@ -124,7 +124,7 @@
 
 2. IP-MR Codec Description 
 
-The IP-MR codec is scalable adaptive multi-rate wideband speech codec 
+The IP-MR codec is a scalable adaptive multi-rate wideband speech codec 
 designed by SPIRIT for use in IP based networks. These codec is suitable
 for real time communications such as telephony and videoconferencing. 
 
@@ -141,12 +141,12 @@
 layer and several enhancement layers which are coded independently. 
 Only the core layer is mandatory to decode understandable speech and
 upper layers provide quality enhancement. These enhancement layers 
-may be omitted and remaining base layer can be meaningfully decoded
+may be omitted and the remaining base layer can be meaningfully decoded
 without artifacts. This makes the bit stream scalable and allows 
 to reduce bit rate during transmission without re-encoding. 
 
 This memo specifies an optional form of redundancy coding within RTP 
-for protection against packet loss. It is based on commonly known 
+for protection against packet loss. It is based on a commonly known 
 scheme when previously transmitted frames are aggregated together 
 with new ones. Each frame is retransmitted once in the following 
 RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech 
@@ -242,7 +242,7 @@
 3.2. Payload Format Structure 
 
 The IP-MR payload format consists of a payload header with general 
-information about packet, a speech table of contents (TOC), and speech
+information about a packet, a speech table of contents (TOC), and speech
 data. An optional redundancy section follows after speech data. The 
 redundancy section consists of redundancy header, redundancy TOC and 
 redundancy data payload. 
@@ -307,7 +307,7 @@
 
     o D (1 bit): reserved. Must be always set to 1. 
       Previously, this bit indicated DTX mode availability, but in fact
-      payload dublicates this information.
+      payload duplicates this information.
 
     o A (1 bit): reserved. Must be always set to 1. 
       Previously, this bit indicated aligned mode, but this mode has 
@@ -322,7 +322,7 @@
 
 3.4. Speech Table of Contents 
 
-The speech TOC contains entries for each frame in packet (grouping size
+The speech TOC contains entries for each frame in a packet (grouping size
 in total). Each entry contains a single field: 
 
                                    0
@@ -341,7 +341,7 @@
       lost frame itself or by empty frames generated by the encoder 
       during silence intervals in DTX mode. 
 
-Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC 
+Note that if the CR flag from the payload header is 7 (NO_DATA) then the speech TOC 
 is empty. 
 
 3.5. Speech Data 
@@ -350,30 +350,30 @@
 noise frames, as specified in the speech TOC of the payload. 
 
 Each speech frame represents 20 ms of speech encoded with the rate 
-indicated in the CR and base rate indicated in BR field of the payload 
+indicated in the CR field and the base rate indicated in the BR field of the payload 
 header. 
 
-The size of coded speech frame is variable due to the nature of codec. 
-The Encoder's algorithm decides what size of each frame is and returns 
+The size of the coded speech frame is variable due to the nature of codec. 
+The Encoder's algorithm decides the size of each frame and returns 
 it after encoding. In order to save bandwidth the size is not placed 
-into payload obviously. The frame size can be determined by frame's 
+into the payload. The frame size can be determined by frame's 
 content using a special service function specified in Appendix A. 
-This function provides complete information about coded frame including
-size, number of layers, size of each layer and size of perceptual 
+This function provides complete information about the coded frame including
+its size, the number of layers, the size of each layer and the size of perceptual 
 sensitive classes.
 
 3.6. Redundancy Header 
 
 If a packet contains redundancy (R field of payload header is 1) the 
-speech data is followed by redundancy header: 
+speech data is followed by the redundancy header: 
 
                              0 1 2 3 4 5
                             +-+-+-+-+-+-+
                             | CL1 | CL2 |
                             +-+-+-+-+-+-+
 
-Redundancy header consists of two fields. Each field contains class 
-specifier for amount of redundancy partly taken from the preceding 
+The redundancy header consists of two fields. Each field contains class 
+specifier for the amount of redundancy partly taken from the preceding 
 packet (CL1) and pre-preceding packet (CL2), e.g. distant from the 
 current packet by 1 and 2 packets accordingly. The values are listed 
 in the table below: 
@@ -409,20 +409,20 @@
 Each specifier takes 3 bits, thus the total redundancy header size is 6 
 bits. 
 
-These classes indicate subjective importance of bits from core layer. 
-Class A contains the bits most sensitive to errors and lost of these 
+These classes indicate the subjective importance of bits from the core layer. 
+Class A contains the bits most sensitive to errors and loss of these 
 bits results in a corrupted speech frame which should not be decoded 
-without applying packet loss concealment (PLC) procedure. Class B is 
+without applying a packet loss concealment (PLC) procedure. Class B is 
 less sensitive than class A and so on to F. Sum of all bit classes 
 from A to F composes core layer.
 
-Putting some part (classes of bits) from previous frame into current 
-packet makes possible to partially decode previous frame in case of 
-it's lost. Than more information is delivered than less speech quality 
-degradation will be. Flags CL1 and CL2 specify how many classes from 
-previous frames current packet contain. E.g. CL1=3 (class C), it means 
-that packet contains bits from classes A, B and C of previous frame. 
-If CL1=6 (class F) then whole core layer is included.
+Putting some part (classes of bits) from a previous frame into the current 
+packet makes it possible to partially decode a previous frame in case it
+got lost. Since more information is delivered, less speech quality 
+degradation will be observed. The flags CL1 and CL2 specify how many classes from 
+previous frames the current packet contain. For example, CL1=3 (class C) means 
+that the packet contains bits from classes A, B and C of the previous frame. 
+If CL1=6 (class F) then the whole core layer is included.
 
 3.7. Redundancy Table of Contents 
 
@@ -432,7 +432,7 @@
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 
 The redundancy TOC contains entries for redundancy frames from preceding 
-and pre-preceding packets. Each entry takes 1 bit like speech TOC entry 
+and pre-preceding packets. Each entry takes 1 bit like the speech TOC entry 
 (3.3):
 
                                    0
@@ -453,21 +453,21 @@
       is equal to the grouping size of the current packet. E.g. maximum 
       number of entries is 4*2 = 8. 
 
-    o If class specifier in the redundancy header is CL=0 (NO_DATA) 
-      then there is no entries for corresponding packet redundancy. 
+    o If the class specifier in the redundancy header is CL=0 (NO_DATA) 
+      then there are no entries for corresponding packet redundancy. 
 
 
 3.8. Redundancy Data 
 
-Redundancy data of a payload contains redundancy information for one or 
+The redundancy data of the payload contains redundancy information for one or 
 more speech frames or comfort noise frames that may be lost during 
 transition, as specified in the redundancy TOC of the payload. Actually 
 redundancy is the most important part of preceding frames representing 
 20 ms of speech. This data MAY be used for partial reconstruction of 
-lost frames. The amount of available redundancy is specified by CL flag 
-in redundancy header section (3.5). This flag SHOULD be passed to 
+lost frames. The amount of available redundancy is specified by the CL flag 
+in the redundancy header section (3.5). This flag SHOULD be passed to the
 decoder. The size of redundancy frame is variable and can be obtained 
-using service function specified in Appendix A.
+using the service function specified in Appendix A.
 
 
 4. Payload Examples