[xrblock] Video loss concealment support in draft-ietf-xrblock-rtcp-xr-concsec
Qin Wu <bill.wu@huawei.com> Tue, 16 October 2012 06:31 UTC
Return-Path: <bill.wu@huawei.com>
X-Original-To: xrblock@ietfa.amsl.com
Delivered-To: xrblock@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id 7EB6721F890F for <xrblock@ietfa.amsl.com>; Mon, 15 Oct 2012 23:31:10 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -4.395
X-Spam-Level:
X-Spam-Status: No, score=-4.395 tagged_above=-999 required=5 tests=[AWL=-0.434, BAYES_00=-2.599, HTML_FONT_FACE_BAD=0.884, HTML_MESSAGE=0.001, MIME_BASE64_TEXT=1.753, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([64.170.98.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SB3yOVAdFnIK for <xrblock@ietfa.amsl.com>; Mon, 15 Oct 2012 23:31:08 -0700 (PDT)
Received: from lhrrgout.huawei.com (lhrrgout.huawei.com [194.213.3.17]) by ietfa.amsl.com (Postfix) with ESMTP id 0501321F8806 for <xrblock@ietf.org>; Mon, 15 Oct 2012 23:31:06 -0700 (PDT)
Received: from 172.18.7.190 (EHLO lhreml204-edg.china.huawei.com) ([172.18.7.190]) by lhrrg01-dlp.huawei.com (MOS 4.3.5-GA FastPath queued) with ESMTP id ALR05153; Tue, 16 Oct 2012 06:31:05 +0000 (GMT)
Received: from LHREML402-HUB.china.huawei.com (10.201.5.241) by lhreml204-edg.china.huawei.com (172.18.7.223) with Microsoft SMTP Server (TLS) id 14.1.323.3; Tue, 16 Oct 2012 07:30:32 +0100
Received: from SZXEML445-HUB.china.huawei.com (10.82.67.183) by lhreml402-hub.china.huawei.com (10.201.5.241) with Microsoft SMTP Server (TLS) id 14.1.323.3; Tue, 16 Oct 2012 07:31:03 +0100
Received: from w53375 (10.138.41.149) by szxeml445-hub.china.huawei.com (10.82.67.183) with Microsoft SMTP Server (TLS) id 14.1.323.3; Tue, 16 Oct 2012 14:30:52 +0800
Message-ID: <DD2E3ADF9AA44E5BAFBDD56E4AC818F6@china.huawei.com>
From: Qin Wu <bill.wu@huawei.com>
To: xrblock@ietf.org
Date: Tue, 16 Oct 2012 14:30:51 +0800
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="----=_NextPart_000_0494_01CDABAA.D7C47960"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.5931
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.6109
X-Originating-IP: [10.138.41.149]
X-CFilter-Loop: Reflected
Subject: [xrblock] Video loss concealment support in draft-ietf-xrblock-rtcp-xr-concsec
X-BeenThere: xrblock@ietf.org
X-Mailman-Version: 2.1.12
Precedence: list
List-Id: Metric Blocks for use with RTCP's Extended Report Framework working group discussion list <xrblock.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/options/xrblock>, <mailto:xrblock-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/xrblock>
List-Post: <mailto:xrblock@ietf.org>
List-Help: <mailto:xrblock-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/xrblock>, <mailto:xrblock-request@ietf.org?subject=subscribe>
X-List-Received-Date: Tue, 16 Oct 2012 06:31:10 -0000
Hi, In order to support video loss concealment, I like to propose the following changes to draft-ietf-xrblock-rtcp-xr-concsec: 1. Abstract: OLD TEXT: " This document defines an RTP Control Protocol(RTCP) Extended Report (XR) Block that allows the reporting of Concealed Seconds metrics for a range of RTP applications primarily for audio applications of RTP. " NEW TEXT: " This document defines an RTP Control Protocol(RTCP) Extended Report (XR) Block that allows the reporting of Concealed Seconds metrics for a range of RTP applications. " 2. Section 1.1 Editor's Note OLD TEXT: " At any instant, the audio output at a receiver may be classified as either 'normal' or 'concealed'. 'Normal' refers to playout of audio payload received from the remote end, and also includes locally generated signals such as announcements, tones and comfort noise. Concealment refers to playout of locally-generated signals used to mask the impact of network impairments such as lost packets or to reduce the audibility of jitter buffer adaptations. Editor's Note: For video application, the output at a receiver should also be classified as either normal or concealed. Should this paragraph be clear about this? " NEW TEXT: " At any instant, the media output at a receiver may be classified as either 'normal' or 'concealed'. 'Normal' refers to playout of media payload received from the remote end, and also includes locally generated signals such as announcements, tones and comfort noise. Concealment refers to playout of locally-generated signals used to mask the impact of network impairments such as lost packets or to reduce the discontinuities in the media play-out (e.g.,audibility of jitter buffer adaptations). " 3. Section 1.4 Editor Note OLD TEXT: " This metric is primarily applicable to audio applications of RTP. EDITOR'S NOTE: are there metrics for concealment of transport errors for video. " NEW TEXT: " These metrics are primarily applicable to audio applications of RTP. In addition, these metrics are also used for concealment of transport errors for video applications of RTP. " 4. Section 2.1 Editor's Note OLD TEXT: " 2.1. Standards Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. In addition, the following terms are defined: Editor's Note: For Video loss concealment, at least the following four methods are used,i.e., Frame freeze,inter-frame extrapolation, interpolation, Noise insertation, should this section consider giving definition of these four methods for video loss concealment? " NEW TEXT: " 2.1. Standards Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119]. In addition, the following terms are defined: Frame freeze The impaired video frame is not displayed, instead, the previously displayed frame is hence “frozen” for the duration of the loss event. Inter-frame extrapolation If an area of the video frame is damaged by loss, the same area from the previous frame(s) can be used to estimate what the missing pixels would have been. This can work well in a scene with no motion but can be very noticeable if there is significant movement from one frame to another. Simple decoders may simply re-use the pixels that were in the missing area while more complex decoders may try to use several frames to do a more complex extrapolation Interpolation A decoder may use the undamaged pixels in the image to estimate what the missing block of image should have. Noise insertion A decoder may insert random pixel values - which would generally be less noticeable than a blank rectangle in the image " 5. Section 3. 1st paragraph Editor's Note OLD TEXT: " This sub-block provides a description of potentially audible impairments due to lost and discarded packets at the endpoint, expressed on a time basis analogous to a traditional PSTN T1/E1 errored seconds metric. Editor's Note: Should impairment also cover video application? " NEW TEXT: " This sub-block provides a description of potentially network impairments due to lost and discarded packets at the endpoint, expressed on a time basis analogous to a traditional PSTN T1/E1 errored seconds metric. " 6. Section 3.2, Packet Loss Concealment method defintion, Editor's Note OLD TEXT: " Packet Loss Concealment Method (plc): 2 bits This field is used to identify the packet loss concealment method in use at the receiver, according to the following code: bits 014-015 0 = silence insertion 1 = simple replay, no attenuation 2 = simple replay, with attenuation 3 = enhanced Other values reserved Editor's Note 1 : In the packet loss concealment methods,"Enhanced" is defines as one new Packet loss Concealment method? However it is not clear what this packet loss concealment method looks like? Editor's Note 2: For Video loss concealment, there are a range of methods used, for example: (i) Frame freeze In this case the impaired video frame is not displayed and the previously displayed frame is hence "frozen" for the duration of the loss event (ii) Inter-frame extrapolation If an area of the video frame is damaged by loss, the same area from the previous frame(s) can be used to estimate what the missing pixels would have been. This can work well in a scene with no motion but can be very noticeable if there is significant movement from one frame to another. Simple decoders may simply re-use the pixels that were in the missing area, more complex decoders may try to use several frames to do a more complex extrapolation. (iii) Interpolation A decoder may use the undamaged pixels in the image to estimate what the missing block of image should have (iv) Noise insertion A decoder may insert random pixel values - which would generally be less noticeable than a blank rectangle in the image. Therefore more text required in the future draft to discuss Techniques for Video Loss Concealment method in this document. " NEW TEXT: " Packet Loss Concealment Method (plc): 4 bits This field is used to identify the packet loss concealment method in use at the receiver, according to the following code: bits 011-014 0 = silence insertion (audio) 1 = simple replay, no attenuation (audio) 2 = simple replay, with attenuation (audio) 3 = enhanced (audio) 4 = Frame Freezed (video) 5 = Inter-Frame extrapolation (video) 6 = Interpolation (video) 7 = Noise Insertion (video) Other values reserved " 7. Section 3.2, Unimpaired Seconds, Editor's Note OLD TEXT: " Normal playout of comfort noise or other silence concealment signal during periods of talker silence, if VAD [VAD] is used, shall be counted as unimpaired seconds. Editor's Note: It should be clear that VAD does not apply to video. " NEW TEXT: " For speech application, normal playout of comfort noise or other silence concealment signal during periods of talker silence, if VAD [VAD] is used, shall be counted as unimpaired seconds. " 8. Section 3.2, Concealed Seconds, Editor's Note OLD TEXT: " Equivalently, a concealed second is one in which some Loss-type concealment has occurred. Buffer adjustment-type concealment SHALL not cause Concealed Seconds to be incremented, with the following exception. An implementation MAY cause Concealed Seconds to be incremented for 'emergency' buffer adjustments made during talkspurts. Loss-type concealment is reactive insertion or deletion of samples in the audio playout stream due to effective frame loss at the audio decoder. "Effective frame loss" is the event in which a frame of coded audio is simply not present at the audio decoder when required. In this case, substitute audio samples are generally formed, at the decoder or elsewhere, to reduce audible impairment. Buffer Adjustment-type concealment is proactive or controlled insertion or deletion of samples in the audio playout stream due to jitter buffer adaptation, re-sizing or re-centering decisions within the endpoint. Because this insertion is controlled, rather than occurring randomly in response to losses, it is typically less audible than loss-type concealment. For example, jitter buffer adaptation events may be constrained to occur during periods of talker silence, in which case only silence duration is affected, or sophisticated time-stretching methods for insertion/deletion during favorable periods in active speech may be employed. For these reasons, buffer adjustment-type concealment MAY be exempted from inclusion in calculations of Concealed Seconds and Severely Concealed Seconds. Editor's Note: In this document, two kind of concealments are defined: a. Loss-type concealment b. Buffer Adjustment-type concealment Loss-type concealment is applicable to both audio and video. However Buffer Adjustment-type concealment is usually applied to audio. Should this section be clear about this? " NEW TEXT: " Equivalently, a concealed second is one in which some Loss-type concealment has occurred. Buffer adjustment-type concealment is usually designed for audio application and SHALL not cause Concealed Seconds to be incremented, with the following exception. An implementation MAY cause Concealed Seconds to be incremented for 'emergency' buffer adjustments made during talkspurts. Loss-type concealment is reactive insertion or deletion of samples in the media playout stream due to effective frame loss at the media decoder. "Effective frame loss" is the event in which a frame of coded media is simply not present at the media decoder when required. In this case, substitute media samples are generally formed, at the decoder or elsewhere, to reduce audible pr perceivable impairment. Buffer Adjustment-type concealment is proactive or controlled insertion or deletion of samples in the audio playout stream due to jitter buffer adaptation, re-sizing or re-centering decisions within the endpoint. Because this insertion is controlled, rather than occurring randomly in response to losses, it is typically less audible than loss-type concealment. For example, jitter buffer adaptation events may be constrained to occur during periods of talker silence, in which case only silence duration is affected, or sophisticated time-stretching methods for insertion/deletion during favorable periods in active speech may be employed. For these reasons, buffer adjustment-type concealment MAY be exempted from inclusion in calculations of Concealed Seconds and Severely Concealed Seconds. " Regards! -Qin