EXPLISP BOF Wed am ================== I do minutes and issues. Agenda. Why we're here. To form an "exploratory group" a la rfc5111. Host discussions and documents necessary to perform experiments that help the community understand etc. Milestones. Close July 2009. Current drafts. Scope for the BOF (Jari): - Concern over the scaling of the routing system. Long lasting. IAB 2006 R&A workshop, which led to significant efforts in research community. "State*Rate". PI addressing, multihoming, traffic engineering, v4 and v6 tables, errors etc. - The Daigle Principle: a network should be able to implement reasonable internetworking choices without unduly impacting another network's operation. Some of today's internetworking only implementable in ways that threaten this principle. - How to fix? - 1. improve routers, electronics etc. - 2. new operational practices (e.g. Francis) - 3. Engineering changes in routing protocols - 4. Architectural improvements. Lots of interest. - Slide on architectural alternatives. generally aggregate at provider level, but indirection at host -> shim6, ilnp; transport multipath; packet tunneling -> lisp, apt etc.; and rewriting -> six/one router. - Issues with solutions: - connectivity to old internet - no routing economy and pushback for new entries; the price is zero just staying with where you are. - which nodes have to change - do all apps work over it? - ability to pass addresses in apps - what are the costs, security, complexity - caching behavior - We have RRG working on it since 2007. Currently working through design space. "Very active", but difficult problem. Current discussion is at the right level, debating design. - Why are we here? There was a request, LISP team wants to move forward. Different mode of operation (I have an implementation) but this BOF request was controversial. - Just do in RRG? - LISP might not be the right one - Jari thinks none of the solutions are ready for prime time. - Focus on technical, even conceptual, challenges, not details. - So complement the work in RRG with experiments. Not the intention to replace RRG. - Outputs: docs that describe where experimenation is helpful, docs about experiments and results, experimental protocol specs. - 18 months. Need to demonstrate commitment to focus on experiments before submitting RFCs to rfc-editor. - Group focuses on LISP and LISP+ALT. Some results may apply to other approaches too. - If other proposals from research community have similar interesting questions and experimentation issues, sure, they can create groups. Questions: - Joe Touch: Talk about rfc5111. Exp WGs are like WGs but have lower bar for creation and focus on experiments in order to find out whether a WG is worthwhile. Joe: 5111 talks about the WG being an experiment, not the WG performing experiments. Darrel: we are not experimenting in the way the RRG is, we're doing engineering work and learning about the design space through deployment and analysis. - Jari: use the time here to talk about whether this group makes sense. - Peter Lothberg: RRG is doing this, now we have another group doing LISP. Maybe have another EXP WG. If you make a WG, dilute effort. - Joe Touch: is the issue that most people are concerned about in coming here whether this is IRTF or IETF, or are people concerned with whether the work is worthwhile. If everyone already agrees work is worthwhile but have process issues, start on process right now. Jari disagrees, wants to talk about whether work is worthwhile. Darrel: need to clarify what the work is before talk about where the work should go. - Mark Townsley: supports Joe. Most people know what LISP is, but don't know what rfc5111 is. Dino, BOF Update (based on draft-farinacci-lisp-08.txt) Overview; changing mapping entries; mixed locators; spec changes; open issues. Features in order: - improve site multihoming (allow site to control ingress traffic paths, avoid renumbering, do it with lower opex) - improve ISP traffic engineering, using a level of indirection rather than more specific injection. - Reduce core routing table size - aid in v4-v6 transition - provide server load balancing in data center - some form of mobility Iljitsch: what's wrong with multihoming that needs to be improved? A: really it's a traffic engineering issue. Vince Fuller: one fundamental problem with multihoming today is a local optimization with global cost. Ron Bonica: a billion dollar number for network management ... how will introducing another layer of indirection make it simpler? A: no time to answer now. Yakov: said something. Lisp concepts slide Multi-level addressing slide Lisp is map-n-encap LISP solution space Unicast Packet Forwarding Locator Reachability: - Keep mapping entries in what you send out, but flip "loc-reach" bits when an ETR becomes unavailable. ETRs for a site can keep track of each other via their IGP. Christian Vogt: if D2 sends its packets to S2, how does S1 know there is a failure? A: packets load-split. Tim Shepherd: what if there isn't enough traffic to carry these bits? Ron Bonica: what about packets with malicious reachability bits? Yakov: don't need experimentation if you can answer questions by looking at the spec. That will tell you if you need experiments. - How to remove a locator from a slot, "compaction" - Operational, "clock sweep". Used for DNS changes. Peter Lothberg: scaling issues for big hosting place --> Yes. - Protocol, "SMR" => solicit map-request. Skip over it, unfortunately -- read the spec. Mixed locators: Have to skip it. Open Issues - Experiment with more specific mappings and policy-based Map-Reply priority changing. - ISP resident TE-xTR functionality with another ?multi-level LISP? hierarchy - Firm up details on LISP-Multicast - LISP can do some form of mobility - More specific state only at edges in xTRs - Can we extend it for secure and graceful handoff - Continue prototyping ideas and deploying on pilot network - Interoperability of NX-OS, IOS, and OpenLISP Jari Arkko: your list of open issues is very different from mine. Yours is new features. His is "does this break applications" etc. He wants proof. Ron Bonica: security analysis. DMM: there was a threat analysis by Marcelo in early days of LISP. It's out there. Update it. Luigi Iannone, some experimental results Netflow on border router. 1Gb link to Belnet (Belgium research network). Took traces and fed data to a program they wrote and simulated LISP cache. Cache hit rate even with 30 minute timeout is 98-99%. Ric Pruss: so if it were hit rate per packet it would be even closer to 100%. Christian: this is hit rate per flow, not per packet. Erik Nordmark: what prefix lengths? A: those in DFZ BGP tables. Compare DNS lookups necessary to fill up cache. Higher than LISP lookup rate. Onward ... LISP Testbed description. RTT measurements with and without LISP. Essentially no difference. Warnings about testbed, implementation, cache selections. Jari: first packet is delayed. Luigi: no, cache is static. Conclusions: do not need a huge cache, can have a good hit ratio even with a small cache. Lookup overhead is smaller than current with DNS. Tunneling overhread 2-15%. Next steps: again one year later, run on different traces. Analysis impact of scan (and other data traffic) on the cache -- already included, but break them out. Different mapping granularity. Impact on initial flow setup. URLs. Dino: mapping lookups smaller than DNS ... what mapping system? A: no particular lookup system, just counting cache misses. It's cheaper or more frequent to do lisp lookup than dns lookup. Yakov: Cache behavior analysis depends on traffic profile which depends on applications running. A system might fall apart when you change your applications. So document how to deal with changes in traffic profile. Jari: would like to see "how does this p2p", which connects many peers. Dimitri: Christian Vogt: cache behavior depends on proactive versus reactive cache behavior. Wants to know what impact the choice of mapping lookup mechanism has. Vince Fuller, LISP+ALT Going through slides very fast. EID assignment. Regional like RIRs. Issues: - alt can use existing bgp security - dos mitigation using control plane rate limiting - nonce in lisp protocol exchange Robert Raszuk: keepalives (bgp and/or gre) between xTR and ALT router. Dimitri: -> A: only first packet between sites needs to be sent, to get prefix. Erik Nordmark: what prevents an ETR from sending back a bogus mapping in Map-Request? Yakov: EIDs highly aggregatable, but current Internet isn't aggregatable. Christian: how can mitigate a DOS attack by rate limiting? Large-site ETR policy. Advertise coarse-grain into ALT but reply more specifically with Map-Reply. Lots of possibilities. Other issues: - business model - should it be rooted at/run by RIRs? - Different levels run by different orgs - should it be free? Dino: issue: wrt mixed locators: If you have mixed locators, we assume ALT will be dual stack, but if site sees other site has a v6 locator, but underlying path only supports v4, have problem. Marshall Eubanks: If you have millions of sites, how big does ALT need to be? A: low data traffic, low routing table size, and most will be "low opex". Xiaohu Xu: attacker can use up cache on ITR. Interworking (Darrel): Skipped. Dave Thaler: Implications on upper layers draft-thaler-ip-model-evolution-01.txt "apps" means anything above network layer. Many have assumptions (or "myths"). App behavior example: "ping" candidate servers. If the first one is dropped, will end up with second one, non-optimal. Reordering and loss are rare. Apps assume "addresses" used by routing. Some make assumptions about locality. Some use raw sockets. What if app is sending to something in the core of the network, "inside" LISP. Joe Touch: just because other protocols go against app assumptions, doesn't mean it's not a problem to go against them. Discussion Jari: take the "exploratory group" issue off the table. The proposal is for a regular IETF WG. Do we need to do any work in the design space here? Do we need an IETF WG in addition to the RRG? Erik Nordmark: hasn't seen a list of experiments. DMM: he does think LISP is ready, it's in an engineering phase. There are open questions, and there are always open questions in WGs. In general in a protocol WG (less for ops) he has never seen a working group that was chartered to do experiments about that protocol. DMM says they have been told by RRG that this LISP work is engineering and is not appropriate for RRG. Since they are in an engineering phase, want a WG. So the question is whether LISP is mature enough to warrant a WG. Jari doesn't believe ready for real protocol specifications. Tony Li: What happened in RRG is that we ask them not to do engineering. Don't spend meeting time on engineering issues that can be covered elsewhere. LISP can be a "sub-group" of RRG. RRG may charter "sub-groups" that can go off in a corner and do what they want and come back to the RRG with updates. Marshall: really wants to do the work, but need time. Lixia Zhang: Doesn't like that questions were shut down. We didn't have time. RRG is trying to work out direction. LISP is just one direction. Iljitsch: ALT has fatal flaws, so it's useless. He will post his list of issues. He thinks it's an overlay of the Internet. Tie in with mobility etc. But he didn't explain his issues. Yakov: a lot of open issues. A number of them don't require experimentation, so close them first. Joe Touch: not spending enough time talking about content, but could do that as a sub-group in IRTF. IETF WGs have difficulty completing tasks, so don't start it out with a lack of explicit goals. Mark Townsley: lack of precedent wrt the type of WG being asked to be formed here. Not true. HIP was formed to produce only experimental documents and we have learned a lot from it. Ross Callon: Routing AD. There are two classes of questions we could talk about here. First technical issues, second WG or not. Second, without AD hat, concerned about frailty of solution. Also frustrated about lack of specific discussion of LISP on RRG -- wants more. Interested in sub-group. Yakov: what lisp is trying to do is not just pure engineering. There are bigger issues, for caching. He thinks that's pre-engineering. Jari: on RRG there is question of whether to cache at all. David Conrad: how will you know if it works without experimentation? Eliot Lear: see NERD, there are a lot of open questions about caching. Attempting to move forward through experimentation is trying to develop answers to research questions. Elwyn Davies: concerned about mission creep. Thinks ALT is an "underlay" something like "MPLS". But he didn't explain his issue better than that. Christian: Mark Townsley: What the WG would look like. Yes it's like HIP. Before those docs advance to the IESG there must be some level of description of experiments performed. Peter Lothberg: Time is running out, stop trying to be really sexy and just make something work. Too fractured, can't keep up. We have a problem to solve. Jari: poll: There is interest in discussing; some concern about mapping system; difference of opinion on how to organize work. Room definitely wants more time to discuss LISP/LISP+ALT and larger RRG problems too. Vote LISP+ALT WG, or Recommend LISP+ALT to RRG: WG? About 20. Stay in RRG? About 40. Jari is 'committed to organizing more time for this'.