GROW notes, Wed am ================== Agenda MRT last call - finally. email in the next week or so. draft-dimitri-grow-rss as a WG doc? Will ask on list; no opinion in group. John Scudder, draft-scudder-bmp See slides. Only comments recorded. I'll record slides as I have time. BGP monitoring protocol. 3 years ago brought this draft to GROW and there was general interest in having a wa to get full routing tables aggregated info off a router in a protocol way instead of having to scrape it. But the mechanism proposed then was hard to implement. Like what happens with congestion and back pressure? Recent renewed interest and collaborators. New approach, provides similar benefits in an implementable fashion. Already implemented. Router configured with mgmt station identity, list of peers of interest. Connects to mgmt station and sends initial dump of all routes for those peers. Formated as BGP UPDATE msgs wrapped in a BMP header. As peers advertise/withdraw routes, additional updates send to mgmt station. BMP header has peer identity, time stamp. Stats reports are threshold or timer driven. Various defined counters (see slides). Big difference from 00 are BMP messages are not just clones of what came from peer. Regenerated from Adj-RIB-In not Loc-RIB. Less "fidelity" but you converge in the same way BGP does. See slide. Not suitable for use as a routing protocol. Re-adopt as a GROW WG item? Discussion: Dave Meyer: Great, really could use it in Routeviews and such. Run BMP at the remote site and collect from it. These days most just use Quagga, but want lightweight listeners. Having this on the send side would be very good. Ruediger Volk: wants something like this, maybe with some laundry list items tacked on to get info about an operational network. Want to see what is happening at edge of network. Not just for research, has operational use. Top need: some way of indicating something that happened in policy process when we get the thing, for example knowing this route was dropped because my policy said so. John: we want to keep it lean to have a working implementation, but will talk to co-authors. Dimitri: Need to do consistency check, want to know why a route was included or rejected. John: first goal was just to get the routes off. A second goal could be the decision process, but be careful to keep it lean. Not now. Dave Ward: After 10 years the idea is finally moving forward. Shows an implementation will drive standards. Wants stats message to be just one type of a generic message so router could send other interesting information. John: stats is already just a TLV. Ward: also suppression info will say how much is floating around that has to be dealt with. John: Yes. Dave Meyer: keep it lean. If we want to analyze rejections, do it offline, collect with another tool. John: if can get info cheap or free, might as well do it. Ruediger: policy engines are generic, it's hard to gather the information offline. Mike Patton: want to know *how* it picked that route, and getting it offline can't give you that if you don't duplicate typos etc. Dino, LISP Implementation Report See slides. Intent: to take the work further along, further the design, and the end goal is to form a working group. lisp@ietf.org. http://www.lisp4.net, http://www.lisp6.net Implementation schedule. Implementation design. The BGP in an ALT router is just straight BGP. No software changes. Inside the box: new things have dotted lines. ETR and ITR can be separate processes or separate boxes -- separate boxes not tested yet. LISP does not intentionally keep IPv4 alive or constrain IPv6, it fixes the same problems for both. Loc-reach bit support, learning a lot, see next IETF. Both Map-Request and Data-Probe support on ALT but default to Map-Request. Multiple EID-prefix support. If can allocate out of the IPv6 space from IANA, can keep number of prefixes way down. Added "accept-map-request-mapping", control-plane gleaning with verification option. Off by default. Data-plane gleaning was added a long time ago -- rather dangerous, so off by default. We believe LISP should not be used for high speed mobility, but not sure about slow mobility. Christian Vogt concern about verification -- do it later. Debug/Show commands. Statistics. Traceroute support. 3-segment path. That works for IPv4 and IPv6. If v4-in-v6, actually made to look like a tunnel hop. IOS-XR implementation may do another level of encapsulation - TE-ITR/TE-ETR. What's next? Solicit-Map-Request records (to notify peers that mappings have changed). Rate limiting and security. Policy priority assignment when returning Map-Replies. LISP-Multicast: see MBONED. PTR optimizations to reduce cache size, condensing them. Continue lowering OpEx of xTRs. Have removed BGP completely. Maybe have ALT infrastructure not extend all the way. Questions Christian Vogt: security issue: send probe through the ALT. Dino: that's what's done now. Christian: [missed a little] ... Dino: if build an ETR, there's little reason not to include ITR capability. It's like building BGP without route reflector support. But a PTR is a special case, it only encapsulates packets. Christian: PTRs: Concern that the guy who gets the benefit is not the one who has to deploy the box. The whole system has to be deployed, and there is no benefit just to have one. Dino: there is a lot that is done for the good of the Internet. Who will run these boxes? Christian: a PTR attracts traffic. Dino: that's good, SPs like to attract traffic. If they deploy PTRs they will do it for the benefits they see. Ron Bonica as individual: "mapping cache": concern about cache-based forwarding. Will LISP have the same problems? Dino: has been doing this for 20 years or more, and thinks cache problems are implementation issues. We cannot get fast CPUs built without L2/L1 caches. Those are successes. In the early days of Cisco cache worked for a long time, then failed later. What is a cache? Granularity, coarseness. ==> talk about it in the bar. Ron: please talk about what led to bad experiences. Dino: has DNS caching been successful? Has Akamai? Someone: This assumes the existing infrastructure as a given. You have to have that in order to build on it. Also, your architecture could use anything as a mapping function. Dino: ALT is an example but we have four other ways of doing mapping. We have push and pull and DHT. Same: mobility: are you really chucking that out? Not a driver for us. Same: will this scale up? Will you test with a million agents? Could prepopulate the DB with all these. Yakov: If drop data on floor, 10Gb link, do you drop it all? Dino: It's the same if you drop a BGP route. Yakov: today can fit all routes. Dino: there are failure scenarios, there could be resource problems, so you need to reduce table size. For example you could put a 0/0 map cache entry in and push the problem somewhere else. Choose your tradeoff. Dimitri: the diagram shows the interfaces but what are the procedures? Should show how cache entries are processed. Dino: this is an implementation diagram. We could go to another level of detail to explain everything. Dimitri: why no arrow between mapping DB and mapping cache? Dino: you could implement it that way. Ron Bonica with hat on: wants a draft on ops experience with cache-based forwarding in general. Dino: there's pull and there's push, and we know DNS wouldn't have scaled as a push database. Dino: this is an open effort not just Cisco. For the good of the Internet. We break now. Interim meeting? No plan at this time. Jari Arkko: the first presentation was kind of rerunning the BOF, not good. There was a lot of information, e.g. caching. Don't focus on the technology so much, let's talk about operational impact. Darrel Lewis: LISP practice and experience Agenda LISP+ALT today. All eBGP except site Darrel, which has two GRE tunnels but is not running BGP, just a static route. The ALT aggregators originate EID announcements for titanium-darrel. Deployment model. Hard to have a Titanium in your office, lots of noise and heat. Assignment strategy: got v4 space from Andrew Partan. High level geographic. Mixed locators. Can respond to a Map-Request for a v6 EID with a v4 locator and vice versa. xTR configuration. Mixed locator configuration. PTR configuration. Case Study 1: When turned on LISP broke all connectivity. All static map cache commands -> broke external connectivity. Need to distinguish whether an address is on ALT or not. Case Study 2: couldn't ping between sites, but could ping between RLOCs. Unit testing worked between EIDs. Test from loopback to loopback failed. ==> receive path decaps handled differently than forwarding path. Case Study 3: IPv6 EID pings IPv6 EID over IPv4 encaps, mixed locator RLOC. But they had dual stack, and their return Map-Reply came via IPv6. Map-Reply was generated in IPv6 format but sending site was v4 only. Just because a site supports a protocol family doesn't mean there is a path for it. --> let a site determine the address family it would use. Lessons learned. ALT is simple to configure and operate. Developing a debugging methodolog is critical. For web-based apps at least, issues of stretch and first packet loss are overrated -> moved to Map-Request for default. Lars Eggert: TCP has a 3 second retransmit. If don't see a difference, either ALT forwarding is really slow or something else is going on. Darrel: setting up the page takes longer than the time lost from first packet drop. Different is half a second. Lars: but TCP retransmit is 3 seconds, so should be 3 sec difference, unless something else is going on. Dino: there were so many packets going through the system that the SYN packets were never dropped. Need traceroute. Cache optimization in ITRs is important. PTRs get huge cache because of general Internet scanning all possible prefixes. Open Questions: who runs mapping system, business models can LISP be used for ipv6 transition? Effects of the mapping system on apps PMTU effects Caching behavior in xTRs Enhancing locator reachability detection Make xTRs even easier to manage Q&A: Yakov: deployment model: what significance of assigning EID in geographical fashion? Darrel: ALT can be easily be deployed in a way that aggregates. Yakov: but addresses aren't allocated that way. Darrel: (1) addresses are mostly allocated that way, e.g. RIR level. Some outliers but they can still get their ALT service from the same. (2) [something]. Yakov: so cache will be the size of the Internet routing table. Answer: No. ... we move on ... DMM: wonder if an operational role for the RIRs is feasible. They may have one for SIDR already. Yakov: If have miss in ALT VRF you forward by default VRF? Darrel: Yes. Yakov: why ever forward data through LISP VRF? Darrel: check ALT first. Yakov: that's not the question. Dino: there won't be a route. Yakov: Work on path feasibility? Anything like ICMP unreachables? Answer: ICMP may be filtered. Dino: loc-reach bits will tell you for a certain class of failures; we're thinking about covering another class of failures. Yakov ... these are "open research issues", for RRG not IETF. Christian: how many levels of ALT routers? Darrel: open question, he envisions it being 2-3 levels but will depend on operational experience. Tunnels don't follow topology. All this can be adjusted easily. Dimitri: LISP and multicast? A: not implemented yet. Ron Bonica with no hat: maybe next change experiment to emulate scale, e.g. introduce lots of traffic, lots of prefixes that don't aggregate, maybe bump into ICMP rate limits. Answer: point to analysis of NetFlow data and map caches. Dimitri: wants more on stability of ALT. We're done. Look for last calls