Re: [Hipsec] fault-tolerance for base exchange and update

Miika Komu <miika.komu@hiit.fi> Thu, 07 January 2010 15:14 UTC

Return-Path: <miika.komu@hiit.fi>
X-Original-To: hipsec@core3.amsl.com
Delivered-To: hipsec@core3.amsl.com
Received: from localhost (localhost [127.0.0.1]) by core3.amsl.com (Postfix) with ESMTP id 535363A6847 for <hipsec@core3.amsl.com>; Thu, 7 Jan 2010 07:14:04 -0800 (PST)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -2.398
X-Spam-Level:
X-Spam-Status: No, score=-2.398 tagged_above=-999 required=5 tests=[AWL=0.201, BAYES_00=-2.599]
Received: from mail.ietf.org ([64.170.98.32]) by localhost (core3.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 8RS8DODbgaVX for <hipsec@core3.amsl.com>; Thu, 7 Jan 2010 07:14:02 -0800 (PST)
Received: from argo.otaverkko.fi (argo.otaverkko.fi [212.68.0.2]) by core3.amsl.com (Postfix) with ESMTP id 471F63A6838 for <hipsec@ietf.org>; Thu, 7 Jan 2010 07:14:02 -0800 (PST)
Received: from [192.168.0.2] (cs27096138.pp.htv.fi [89.27.96.138]) by argo.otaverkko.fi (Postfix) with ESMTP id 5261B25ED1A; Thu, 7 Jan 2010 17:13:59 +0200 (EET)
Message-ID: <4B45FAA7.9030609@hiit.fi>
Date: Thu, 07 Jan 2010 17:15:51 +0200
From: Miika Komu <miika.komu@hiit.fi>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: Tobias Heer <heer@cs.rwth-aachen.de>
References: <4B458BB7.8090000@hiit.fi> <8651FB5B-E07F-4EDC-8A8D-434C44AE8E05@cs.rwth-aachen.de>
In-Reply-To: <8651FB5B-E07F-4EDC-8A8D-434C44AE8E05@cs.rwth-aachen.de>
Content-Type: text/plain; charset="ISO-8859-1"; format="flowed"
Content-Transfer-Encoding: 7bit
Cc: hip WG <hipsec@ietf.org>
Subject: Re: [Hipsec] fault-tolerance for base exchange and update
X-BeenThere: hipsec@ietf.org
X-Mailman-Version: 2.1.9
Precedence: list
Reply-To: miika.komu@hiit.fi
List-Id: "This is the official IETF Mailing List for the HIP Working Group." <hipsec.ietf.org>
List-Unsubscribe: <https://www.ietf.org/mailman/listinfo/hipsec>, <mailto:hipsec-request@ietf.org?subject=unsubscribe>
List-Archive: <http://www.ietf.org/mail-archive/web/hipsec>
List-Post: <mailto:hipsec@ietf.org>
List-Help: <mailto:hipsec-request@ietf.org?subject=help>
List-Subscribe: <https://www.ietf.org/mailman/listinfo/hipsec>, <mailto:hipsec-request@ietf.org?subject=subscribe>
X-List-Received-Date: Thu, 07 Jan 2010 15:14:04 -0000

Tobias Heer wrote:

Hi,

> Hi,
> 
> Am 07.01.2010 um 08:22 schrieb Miika Komu:
> 
>> Hi,
>> 
>> Baris Boyvat has implemented an experimental fault-tolerance
>> extension for the HIP base exchange and UPDATE in the HIPL
>> implementation. He will document it in his master thesis during
>> this year, but I would like to start discussion of the topic
>> already now.
>> 
> Great. I think this extension is really worth some deeper
> investigation. In our own tests with HIP(L) we found that timing and
> aggressiveness regarding retransmissions and opportunistic double
> transmissions can greatly improve the performance.
> 
>> At the protocol level, the extension allows sending multiple I1 or
>> UPDATE-with-locator packets sequentially. The idea is to scan
>> through all possible source and destination IP pairs at the HIP
>> layer to improve  the chances for successful initial contact (I1)
>> and to re-establish contact (UPDATE-with-locator) in way similar to
>> the NAT-ICE extensions. We have playfully called the extension as
>> "shotgun" mode in the implementation :)
>> 
>> The obvious difference to ICE is that the shotgun mode works at the
>> HIP protocol layer. A non-obvious difference is that the approach
>> supports also fault-tolerance for a single relay/rendezvous
>> (Responder's RVS has crashed) and it can make use of multiple
>> relay/rendezvous servers for better redundancy. At the moment,
>> neither of these are possible direcly with the ICE-NAT extensions.
>> I actually believe the shotgun approach can be applied even with
>> the ICE-NAT extensions to improve fault-tolerance.
>> 
>> The shotgun approach seems useful to improve fault-tolerance with
>> an without (single or multiple) rendezvous/relay middleboxes, but
>> there is also another use case for this. The Initiator (or Mobile
>> Node) can learn multiple mappings for the peer, some of which may
>> have connectivity and some not. It is also possible that a malign
>> user intentionally sends invalid mappings for a well-known service
>> in a multiuser system (this case also requires some rate control
>> for mappings per user). In such scenarios, it is useful to try
>> multiple peer addresses sequentially instead of just single one.
>> 
>> Minimally, the approach requires few considerations in an
>> implementation:
>> 
>> i) Allow sending of multiple I1 and UPDATE-with-locator packets in
>> a rate-controlled fashion ii) Filter redundant incoming packets.
>> 
>> Case (ii) could be implemented as filtering of I1 packets or
>> filtering of R1 packets. We chose filtering of redundant R1 packets
>> in the implementation and it required a small change in the state
>> machine. For the UPDATE filtering, filtering based on sequence
>> numbers was sufficient.
>> 
>> I would like the WG feedback on whether we could include this
>> approach in RFC5201-bis and RFC5206-bis (as MAY or SHOULD).
> 
> I would like to see this as a separate document that solely focuses
> on fault-tolerance and performance I think the shotgun extension is a
> first step to a comprehensive document. My two reasons for this are:
>  a) I think solving the problem goes beyond the scope of the base
> documents because this problem domain offers more possible solutions
> than the shotgun mode. A separate document could discuss use cases
> and solutions in more depth than it can be done in the base
> documents. b) Measures for improving fault tolerance may be quite
> specific to a scenario and may require to make some assumptions that
> cannot be made in the general case.

Well, I am just a bit skeptic that this will be never taken into use if 
the state machine filtering part are not part of RFC5201-bis and 
RFC5206-bis.

> Some more thoughts on fault tolerance:
> 
> As far as I understood, the shotgun extension only works with
> multiple interfaces. What about optimizations for single-homed hosts?

the shotgun mode does not "care" about interfaces. It pairs up
addresses, not interfaces. So if you've got two addresses on a single
interface machine and peer has got one, two redundant packets will be sent.

> As far as I understood, the shotgun mode will make the mobile devices
> switch interfaces quite aggressively. What happens if the primary
> interface (e.g. WiFi) is temporarily down (because of a recent L2/L3
> handover). The shotgun mode will determine that the secondary
> interface (e.g., GPRS) is working and will switch to the secondary
> interface? Do we need a mechanism to switch back as soon as the
> primary interface is available again?


This is a matter of the UPDATE policy and has nothing to do with the 
shotgun extension we're proposing. The shotgun mode just means that you 
send all I1 and UPDATE-with-locator through all known source IP and 
destination IP combinations. So the shotgun mode is quite dumb and simple.

But perhaps I just misunderstood you. I haven't really thought about 
optimizing the shotgun - probably there's room for making it more clever.

> Variable timeouts and increased redundancy depending on the current
> situation (e.g., high packet loss -> more redundancy) might be an
> option, too.
> 
> Thanks for the work you already did in this problem domain.

You're welcome.