Hi. I have reviewed this document as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments. The document defines how SPIRIT IP-MR encoded speech signals can be transported over RTP. The security considerations seem to be adequate. However, I am concerned about the C code in the appendix extracting frame information. The code does not seem to do proper bound checking, which I think is a problem that needs to be fixed. I understand that the frame size is an out parameter - still the size of the buffer passed via pCoded should be available so that proper bound checking can be performed. Other than that, I noticed a number of editorial issues, mostly due to missing articles etc. I am attaching a unified context diff correcting some of the issues (but note that I stopped making changes at the end of section 3 - so there is likely more to fix). /js -- Juergen Schoenwaelder Jacobs University Bremen gGmbH Phone: +49 421 200 3587 Campus Ring 1, 28759 Bremen, Germany Fax: +49 421 200 3103 < http://www.jacobs-university.de/ > --- draft-ietf-avt-rtp-ipmr-12.txt 2010-03-05 14:34:16.000000000 +0100 +++ draft-ietf-avt-rtp-ipmr-12-js.txt 2010-03-05 14:51:08.000000000 +0100 @@ -124,7 +124,7 @@ 2. IP-MR Codec Description -The IP-MR codec is scalable adaptive multi-rate wideband speech codec +The IP-MR codec is a scalable adaptive multi-rate wideband speech codec designed by SPIRIT for use in IP based networks. These codec is suitable for real time communications such as telephony and videoconferencing. @@ -141,12 +141,12 @@ layer and several enhancement layers which are coded independently. Only the core layer is mandatory to decode understandable speech and upper layers provide quality enhancement. These enhancement layers -may be omitted and remaining base layer can be meaningfully decoded +may be omitted and the remaining base layer can be meaningfully decoded without artifacts. This makes the bit stream scalable and allows to reduce bit rate during transmission without re-encoding. This memo specifies an optional form of redundancy coding within RTP -for protection against packet loss. It is based on commonly known +for protection against packet loss. It is based on a commonly known scheme when previously transmitted frames are aggregated together with new ones. Each frame is retransmitted once in the following RTP payload packet. f(n-2)...f(n+4) denotes a sequence of speech @@ -242,7 +242,7 @@ 3.2. Payload Format Structure The IP-MR payload format consists of a payload header with general -information about packet, a speech table of contents (TOC), and speech +information about a packet, a speech table of contents (TOC), and speech data. An optional redundancy section follows after speech data. The redundancy section consists of redundancy header, redundancy TOC and redundancy data payload. @@ -307,7 +307,7 @@ o D (1 bit): reserved. Must be always set to 1. Previously, this bit indicated DTX mode availability, but in fact - payload dublicates this information. + payload duplicates this information. o A (1 bit): reserved. Must be always set to 1. Previously, this bit indicated aligned mode, but this mode has @@ -322,7 +322,7 @@ 3.4. Speech Table of Contents -The speech TOC contains entries for each frame in packet (grouping size +The speech TOC contains entries for each frame in a packet (grouping size in total). Each entry contains a single field: 0 @@ -341,7 +341,7 @@ lost frame itself or by empty frames generated by the encoder during silence intervals in DTX mode. -Note that if CR flag from payload header is 7 (NO_DATA) then speech TOC +Note that if the CR flag from the payload header is 7 (NO_DATA) then the speech TOC is empty. 3.5. Speech Data @@ -350,30 +350,30 @@ noise frames, as specified in the speech TOC of the payload. Each speech frame represents 20 ms of speech encoded with the rate -indicated in the CR and base rate indicated in BR field of the payload +indicated in the CR field and the base rate indicated in the BR field of the payload header. -The size of coded speech frame is variable due to the nature of codec. -The Encoder's algorithm decides what size of each frame is and returns +The size of the coded speech frame is variable due to the nature of codec. +The Encoder's algorithm decides the size of each frame and returns it after encoding. In order to save bandwidth the size is not placed -into payload obviously. The frame size can be determined by frame's +into the payload. The frame size can be determined by frame's content using a special service function specified in Appendix A. -This function provides complete information about coded frame including -size, number of layers, size of each layer and size of perceptual +This function provides complete information about the coded frame including +its size, the number of layers, the size of each layer and the size of perceptual sensitive classes. 3.6. Redundancy Header If a packet contains redundancy (R field of payload header is 1) the -speech data is followed by redundancy header: +speech data is followed by the redundancy header: 0 1 2 3 4 5 +-+-+-+-+-+-+ | CL1 | CL2 | +-+-+-+-+-+-+ -Redundancy header consists of two fields. Each field contains class -specifier for amount of redundancy partly taken from the preceding +The redundancy header consists of two fields. Each field contains class +specifier for the amount of redundancy partly taken from the preceding packet (CL1) and pre-preceding packet (CL2), e.g. distant from the current packet by 1 and 2 packets accordingly. The values are listed in the table below: @@ -409,20 +409,20 @@ Each specifier takes 3 bits, thus the total redundancy header size is 6 bits. -These classes indicate subjective importance of bits from core layer. -Class A contains the bits most sensitive to errors and lost of these +These classes indicate the subjective importance of bits from the core layer. +Class A contains the bits most sensitive to errors and loss of these bits results in a corrupted speech frame which should not be decoded -without applying packet loss concealment (PLC) procedure. Class B is +without applying a packet loss concealment (PLC) procedure. Class B is less sensitive than class A and so on to F. Sum of all bit classes from A to F composes core layer. -Putting some part (classes of bits) from previous frame into current -packet makes possible to partially decode previous frame in case of -it's lost. Than more information is delivered than less speech quality -degradation will be. Flags CL1 and CL2 specify how many classes from -previous frames current packet contain. E.g. CL1=3 (class C), it means -that packet contains bits from classes A, B and C of previous frame. -If CL1=6 (class F) then whole core layer is included. +Putting some part (classes of bits) from a previous frame into the current +packet makes it possible to partially decode a previous frame in case it +got lost. Since more information is delivered, less speech quality +degradation will be observed. The flags CL1 and CL2 specify how many classes from +previous frames the current packet contain. For example, CL1=3 (class C) means +that the packet contains bits from classes A, B and C of the previous frame. +If CL1=6 (class F) then the whole core layer is included. 3.7. Redundancy Table of Contents @@ -432,7 +432,7 @@ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The redundancy TOC contains entries for redundancy frames from preceding -and pre-preceding packets. Each entry takes 1 bit like speech TOC entry +and pre-preceding packets. Each entry takes 1 bit like the speech TOC entry (3.3): 0 @@ -453,21 +453,21 @@ is equal to the grouping size of the current packet. E.g. maximum number of entries is 4*2 = 8. - o If class specifier in the redundancy header is CL=0 (NO_DATA) - then there is no entries for corresponding packet redundancy. + o If the class specifier in the redundancy header is CL=0 (NO_DATA) + then there are no entries for corresponding packet redundancy. 3.8. Redundancy Data -Redundancy data of a payload contains redundancy information for one or +The redundancy data of the payload contains redundancy information for one or more speech frames or comfort noise frames that may be lost during transition, as specified in the redundancy TOC of the payload. Actually redundancy is the most important part of preceding frames representing 20 ms of speech. This data MAY be used for partial reconstruction of -lost frames. The amount of available redundancy is specified by CL flag -in redundancy header section (3.5). This flag SHOULD be passed to +lost frames. The amount of available redundancy is specified by the CL flag +in the redundancy header section (3.5). This flag SHOULD be passed to the decoder. The size of redundancy frame is variable and can be obtained -using service function specified in Appendix A. +using the service function specified in Appendix A. 4. Payload Examples