Re: [mpls] [mpls-tp] Questions on "draft-ietf-mpls-loss-delay-00"

Ben Niven-Jenkins <ben@niven-jenkins.co.uk> Fri, 28 January 2011 11:57 UTC

Return-Path: <ben@niven-jenkins.co.uk>
X-Original-To: mpls@core3.amsl.com
Delivered-To: mpls@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 36E423A67F6; Fri, 28 Jan 2011 03:57:43 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -103.243
X-Spam-Level:
X-Spam-Status: No, score=-103.243 tagged_above=-999 required=5 tests=[AWL=-0.244, BAYES_00=-2.599, J_CHICKENPOX_31=0.6, RCVD_IN_DNSWL_LOW=-1, USER_IN_WHITELIST=-100]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id PjaxLSGk--2i; Fri, 28 Jan 2011 03:57:41 -0800 (PST)
Received: from mailex.mailcore.me (mailex.mailcore.me [94.136.40.61]) by core3.amsl.com (Postfix) with ESMTP id 6B4C13A67C0; Fri, 28 Jan 2011 03:57:41 -0800 (PST)
Received: from host1.cachelogic.com ([212.44.43.80] helo=dhcp-113-devlan.cachelogic.com) by mail6.atlas.pipex.net with esmtpa (Exim 4.71) (envelope-from <ben@niven-jenkins.co.uk>) id 1Pin0P-0005t8-QT; Fri, 28 Jan 2011 12:00:46 +0000
Mime-Version: 1.0 (Apple Message framework v1082)
Content-Type: text/plain; charset="windows-1252"
From: Ben Niven-Jenkins <ben@niven-jenkins.co.uk>
In-Reply-To: <00bc01cbb1c9$8ceac4d0$790c7c0a@china.huawei.com>
Date: Fri, 28 Jan 2011 12:00:42 +0000
Content-Transfer-Encoding: quoted-printable
Message-Id: <C60D23CB-D202-48F5-B925-42E0F4320E72@niven-jenkins.co.uk>
References: <002901cbadf8$66783640$790c7c0a@china.huawei.com> <20110107112641.GA20790@cisco.com> <014901cbaebf$4fc5f0f0$790c7c0a@china.huawei.com> <AANLkTikS9hbL7fFA665qU=TwCVpvOyhsYYkDAtbS8FT-@mail.gmail.com> <9BCECFE9-04D8-4E8F-A01A-55D76C0E64C9@niven-jenkins.co.uk> <009501cbb0e3$665cbe90$790c7c0a@china.huawei.com> <48AA81F4-C02E-48F5-95BF-0159CFAB682C@niven-jenkins.co.uk> <00bc01cbb1c9$8ceac4d0$790c7c0a@china.huawei.com>
To: Linda Dunbar <ldunbar@huawei.com>
X-Mailer: Apple Mail (2.1082)
X-Mailcore-Auth: 9600544
X-Mailcore-Domain: 172912
Cc: mpls-tp@ietf.org, mpls@ietf.org
Subject: Re: [mpls] [mpls-tp] Questions on "draft-ietf-mpls-loss-delay-00"
X-BeenThere: mpls@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
List-Id: Multi-Protocol Label Switching WG <mpls.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/mpls>
List-Post: <mailto:mpls@ietf.org>
List-Help: <mailto:mpls-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/mpls>, <mailto:mpls-request@ietf.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Jan 2011 11:57:43 -0000

Linda,


On 11 Jan 2011, at 19:55, Linda Dunbar wrote:

> Ben and colleagues:
> 
> You said " But you suggest edit does not suggest a slower transmission rate it states the querier should stop transmission",
> 
> Two comments:
> 
> 1.      I think “slower transmission rate by querier” should be included in the text.
> 

So the model would be that a querier starts with a slow transmission rate and they increase its transmission rate to the actual desired transmission rate over time?

I don't see the need for that.

Your concerns that lead to wanting such behaviour (if I have understood them correctly) are:
1) There is a processing cost on the receiver and that cost may be too high for a given receiver (e.g. rate is faster than it supports)
2) The sender may waste resources counting before the receiving end is ready.

(1) is common in a number of situations (e.g. control packets that need to be sent to a central CPU and therefore take a slow path through a router) and is generally dealt with by applying a rate limit on that particular traffic on the slow path. I'd argue such an approach is sufficient for LM.

(2) is a local implementation decision on the querier so let's not bog the document down with implementation specifics.

I would therefore suggest the following:

When initiating an LM operation, the far end may require a period of time to become ready for the requested measurement operation or the far end may not be able to support the requested measurement. Under those circumstances, LM queries MAY simply be discarded, and the querier expecting a response SHOULD be prepared for this situation. Alternatively, the receiver MAY respond, possibly in a rate-limited manner, to queries received during this period with an appropriate notification code.  The querier should abort the measurement after not receiving any positive feedback when a specified timer expires.  

Your proposed text for 2.7.10 on LM interval looks OK to me.

Ben


> 2.      I suggested that querier may choose not starting counting (not transmission) until a positive feedback from Far End has been received.
> 
> 
> Based on the discussion over this email thread, do people think the following text is more appropriate?
> 
> 
> When initiating an LM operation, the far end may require a period of time to become ready for the requested measurement operation or the far end may not be able to support the requested measurement. During this period Under those circumstances, LM queries MAY simply be discarded, and the querier expecting a response SHOULD be prepared for this situation. The frequency of the initial LM for requesting starting the measurement should be slow, so that there is enough time for far end to check if the requested measurement can be done. The querier should abort the measurement after not receiving any positive feedback when a specified timer expires. setting a timer to differentiate between an acceptable initialization delay and a permanent unavailability condition at the far end. Alternatively, the receiver MAY respond, possibly in a rate-limited manner, to queries received during this period with an appropriate notification code. Since counting transmitted packets at querier side will cost extra resources (or is not free), the querier doesn’t need to start counting its transmitted packets until receiving a positive feedback from the far-end.
> 
> 
> 
> Suggested text for LM/DM Interval:
> 
> 
> 2.7.10: LM Interval
> 
> 
> The interval may affect the accuracy of the packet loss count. From implementation point of view, the shorter the interval, the higher the processing cost to equipment. Querier should encode its intended interval in its LM Message. If the interval specified by the LM message can’t be supported by the far end, the far end can simply ignore the LM requests by not responding anything or respond with an appropriate notification code indicating its minimum allowed LM interval.  
> 
> 
> In the Section 3.1 LM Message Format, add an 8 (or 16) bits field for LM Interval. The Interval Unit should be in ms.
> 
> 
> Linda Dunbar
> 
> -----Original Message-----
> From: Ben Niven-Jenkins [mailto:ben@niven-jenkins.co.uk]
> Sent: Monday, January 10, 2011 11:29 AM
> To: Linda Dunbar
> Cc: 'Greg Mirsky'; mpls@ietf.org; mpls-tp@ietf.org
> Subject: Re: [mpls-tp] [mpls] Questions on "draft-ietf-mpls-loss-delay-00"
> 
> Linda,
> 
> On 10 Jan 2011, at 16:28, Linda Dunbar wrote:
> 
> > From: Ben Niven-Jenkins [mailto:ben@niven-jenkins.co.uk]
> 
> >
> 
> > 1) The edits suggested by Linda change the semantics from suggesting a querier maintain a timer to allow for the receiving end to "get its act together" to if the receiving end doesn't get its act together within X packets stop sending LM. Is that really what's intended? I would expect the time taken for the receiving end "to get its act together" would not be a function of the LM transmission rate and therefore using transmission rate to guess whether the receiving end is dead Vs getting itself ready doesn't sound sensible.
> 
> >
> 
> >
> 
> > [Linda] The reason for suggesting Initiator to have slower transmission rate is to minimize processing needed at the Initiator before Far End committing to the counting.
> 
> >
> 
> But you suggest edit does not suggest a slower transmission rate it states the querier should stop transmission!
> 
> > We have considered implementing this feature on our products and discovered that it DOES take some time for the Central CPU to receive the Counting Request (as LM in this draft), check if the requested Counting can be supported, pass the request to the corresponding line card, and get confirmation from the Line card. Sometimes, the Far End can't perform the requested Counting because it is performing counting for requests from other nodes.
> 
> >
> 
> I accept that a receiver may take some time to configure itself for counting and that the time it takes may be a function of other things it is doing. What I am saying is that the time it takes is never a function of the interval being configured.
> 
> > 2) I don't like the final sentence suggesting queriers should delay counting transmitted packets until receiving a positive response because doing so costs resources. It sounds like an internal implementation decision that has nothing to do with interoperability and therefore irrelevant.
> 
> >
> 
> >
> 
> > [Linda]I agree with your point that that the Source Node performing or not performing the counting before receiving the positive feedback from Far End is not interoperability issue. Maybe it can be a recommendation?
> 
> >
> 
> > But it is necessary for Source node to abort the effort after not receiving the positive feedback after sometime (like X number of LM or timer expires).
> 
> >
> 
> At some point, if the querier has not received a positive response it must abort the operation. The original text suggest maintaing a timer. Your edit suggested making ti a function of packets sent. I am saying the former seems more sensible than the latter to me.
> 
> > 4) Giving the receiving end the opportunity to reply with "I can;t support the rate you're asking for it's too fast for me" sounds attractive but I am not sure it is in practice if one thinks how such functionality is likely to be used. If an operator is going to turn it on with the thinking that performing LM of some sort is important regardless of the LM interval then negotiating a shorter interval may have value. However I don't expect that to be the case. I would expect an operator to pre-plan what interval they need given their knowledge of their network and the accuracy they desire, having the network change that may well cause unexpected consequences. In such a scenario I think it's better to let the receiver "hard fail" and the operator can investigate rather than have the network re-configure and the re-configuration go unnoticed, after all LM failing does not impact the ability of the network to actually forward packets.
> 
> >
> 
> >
> 
> > [Linda] Do you mean that Receiver can ignore the LMs if the Interval specified in the message is more than it can handle?
> 
> That is one option.
> 
> The background to my thinking is:
> 
> I assume that LM will be turned on explicitly (i.e. it will not be on by default)
> 
> I assume that LM is turned on for a reason, e.g. because an operator needs to support an SLA involving packet loss.
> 
> The interval may affect the accuracy of the packet loss count.
> 
> I expect operators would rather LM "hard fail" rather than have the network re-configure automatically to a different interval because an SLA calculation (or whatever else is relying on the LM) could become inaccurate and no-one would know unless they really dug deep into what the network was doing.
> 
> The network re-configuring itself in order to repair itself so it can continue to forward packets is one thing.
> 
> The network reconfiguring non-forwarding related features itself because it believes it knows what to do better than the person who designed & configured it could lead to all sorts of trouble in the field IMO.
> 
> Ben
> 
>  
>