Routing Research Group (RRG) Meeting minutes from IETF-70, Vancouver, Canada SESSION 1: 9:00-11:30 MONDAY DECEMBER 3, 2007 Agenda: 9:00am Chairs: Logistics, agenda bashing 9:10am Simon Schuetz: Node Identity Internetworking Architecture 9:55am Sheng Jiang: Hierarchical Routing Architecture 10:15am Christian Vogt: The Path to Six/One - Incentives, Backward-Compatibility, and Deployment Flexibility 10:45am Petri Jokela: An integrated Six/One-HIP implementation 10:55am Fred Templin: Sprite MTU Determination TIMELINE FOR CONVERGENCE As part of the chairs logistics presentation, Tony Li presented a proposed timeline towards convergence to a single recommended routing architecture for comments. - Plan to drive toward convergence. Want to make one recommendation, one proposal. Start drive toward convergence in March 08. Finish converging by March 09. - DMM: grand idea, but the only procedural thing is "attain rough consensus". Develop criteria? Did do a draft on design goals. Will do taxonomy draft. DMM: if we're using those as dimensions/metrics, develop the methodology for comparing. Lixia and Scott Brim will get a draft together on dimensions along which to be compared. - Eric Fleischmann: have a list of proposals on the table, a short summary and pros and cons against the requirements. - Jari: more emphasis on solving fundamental problems versus fights between solutions. All existing proposals have significant issues that remain to be solved. - Geoff Huston: in multi6 with 40 proposals, there was a generic draft that took design goals and criteria and put them in a generalized architecture. Then classify proposals within the terms of that architecture. Doesn't think a wiki page can handle the subtleties, need deeper analysis ... within a consistent architectural framework. NODE IDENTITY INTERNETWORKING ARCHITECTURE (SCHUETZ) - Not purely routing, not ITR/ETR, not transparent for end hosts, not incremental update to BGP, not unifying the network layer. ID/loc split. Framework for new routing approaches. - Nodes have (crypto) identities, NIDs. Like HIP. Can route on these within a domain, or on anything else you want. [routing sure, but still need a common packet format e2e] - Nodes have locators, grouped in locator domains. - Joel Halpern: don't like the assumption of Internet having a center. How central to your proposal is it that this is a single core and other things hang off it? We used to have that and it went away. - route up to core-LD using default; across core using dest LD; across LD using ID. - There can be nodal routers inside domains as well as at the edges. - Routing hint: NR responsible for a destination node. Map that to a locator. - Forwarding: if stateful, signal to install per-session state along path. Stateless: include NID header in each packet. Compare to HIP: HIP SPINAT is similar to the stateful approach here. - Inter-LD mobility requires registration of the node. - Multihoming: Node registers along multiple paths. NRs have multiple entries per node, enable TE. Details yet to be worked out. - Open design issues: a number of, still early stage in design. - John Scudder: not a routing protocol proposal. Is this a proposal for a new internetwork layer? A: yes, at least. - Kevin Fall: HIP has same fate sharing model as IP where you do the mapping at the end. This requires similar ... couldn't hear answer. HIERARCHICAL ROUTING ARCHITECTURE (HRA), SHENG JIANG - Routing growth threats. Caused by multihoming, etc - Deering hourglass: IP layer needs both identifier and locator HRA is one of the ID/Loc split solutions. - Independent locator domain (LD), possibility of multiple independent address families, combo of LDID and local locator as global unique locator. - 2-level hierarchical routing, inter- and intra LD. Locator Domain Border Routers exchange LD reachability information. Internal routers only store internal routing information. - A management domain (MD) may cover several locator domains, runs one or more mapping servers who store local HIT->locator mapping. Servers also store MDID->LDID mapping (so that one can forward lookup queries to the right MD) - "Hierarchy" is the keyword in HRA: Hierarchical host identifier tag (HIT, made of "management domain id" and hash value of host identifier). Hierarchical mapping system from HITs to locators. Believe that flat host ID has scaling issue and is hard to manage. Hierarchical HIT enables highly efficient lookup. - Someone from China Mobile: How do you guarantee HI is unique in local domain? A: we have duplication detection in IPv6. Each LD is small, possible to do dup detection. - A number of open issues: LD management in face of merge and split; routing policy support; incremental deployment. - when move, re-register with new locators, but still have the same identifier (ID is persistent because it is associated with MD and not LD). Use home mapping servers (like MIP) to scale - Scudder: basically replacing the internetwork layer. THE PATH TO SIX/ONE, CHRISTIAN VOGT - we've known for a long time that there's something wrong with interdomain routing scalability, why is change so hard? because we need a good transition path. There is such a path for Six/One. - Hosts have address bunches. Routers may rewrite source address. Hosts recognize rewrites and adapt. - Description of six/one. - Getting here. - Incentives for providers: reduce routing table, less churn. - edges: can redirect packets from one provider to another provider. reduce renumbering costs. - hosts: influence path selection. lightweight ipv6 crypto. cross-layer optimization. - deployment flexibility. rewrite functionality can be in arbitrary routers. - backward compatibility. en- or dis-able six/one based on support on remote side. - New transition tools - proxies are used to interface legacy (v6) host with 6/1 - translators are used to communicate with remote legacy hosts, replace addresses in packets without reverse. Can combine proxy and translator. - six/one proxy can redirect/map/translate addresses to bridge ipv6 to ipv4 Tunnel six/one packets over v4 internet. - Joe Touch (ISI): proxy translator and gateway ... why will it work better here than it is in ipv6? A: it gives more control to the host. Q: so the reason it has failed for v6 is that we have not put enough mechanism in the hands of the host? A: that's one point, the other is we need a transition technique for the entire v6 address space. Q: we want solutions that makes work of host smaller and more automatic. A: all a host needs to do is retrieve IP addr from remote gateway via DNS. Has to do DNS anyway, so it's automatic and piggybacks. - Jark Arkko: want to avoid host changes by having a proxy, but to make proxy work requires changes in hosts (e.g. dhcp, which not all hosts support). There might also be other mechanisms to assign IP addrs (that the proxy solution has to consider). - Jari: general question: we know we have routing pain, and solutions propose things like NATs, so that's transferring pain, particularly to hosts. Think about the tradeoffs. Liked Six/One in its original proposal, but doesn't like all these added bells and whistles, too much complexity, causing pain. A: this is during transition. - Dow Street: what's the nature of the interaction between edge and core, between hosts and routing system? Should a host have influence on how packets reach them? A: likes giving control to host. If host knows when path has changed, applications can adapt. - Darrel: has a feeling that if you add up time for v4 to v6 transition, then add in id/loc transition, will never finish. No changes to hosts is good. A: should be able to do transitions in parallel. - Darrel: are you concerned about the amount of state a proxy has? some domains are large (in terms of # of hosts). A: don't think so, perhaps similar to the amount of state at an ITR. Q: if a proxy fails, what happens to sessions? A: Yes, affected. Same as when ETR fails? Q: Well, it depends how much state there is. A: the state is in the ITR, and it has to change. INTEGRATING SIX/ONE AND THE HOST IDENTITY PROTOCOL, PETRI JOKELA - here is a six/one implementation, now combining this with HIP. Similar functionality between the two, intergrating six/one into hip to maximize reuse. Important coding work is now underway, with address rewriting in the network being the only significant modification left to be completed. - Suggestion from the audience: References to use of "CGA" in the presentation should be replaced, using terminology that clearly differentiates Six/One addressing mechanisms from the more common cryptographic addressing schemes that are now widely referred to as "CGA". CGA is public key authentication, but not necessary in six/one which only uses hash to get addresses. SPRITE-MTU, FRED TEMPLIN - Tunnel endpoints need to be sure correct MTU is reported back to sources. - Sprite-mtu aims to maximize packet delivery ratio and manage fragmentation, while minimizing intra-tunnel fragmentation and reassembly misallocation in an environment of heterogeneous (but RFC 4213 compliant) MTUs ranging from 1280 to up to 1480-1500 bytes. - Solution: UDP echo service for MTU discovery. Softstate management to track tunnel parameters, explicit congestion notification. Configuration knob for fragmentable outer packets, avoid TFE receive buffer overrun, and avoid/minimize fragmentation on the TNE->TFE path. - Additional work required was identified, specifically to accommodate cases where outer fragmentation makes some encapsulation schemes risky. Cases identified included Teredo and environments in which many nodes behind NAT talk to the same TFE. Solution: UDP fragmentation for teredo. Use ICMP echo request/reply as fallback if TFE does not implement SPRITE-MTU. Possible future extensions of the work include adaptation for IEEE802.3as links with up to 2k EMTUs. - Iljitsch van Beijnum: if two tunnel endpoints behind 1500 MTUs, and hosts have 1500, and teredo, what is mtu? I send 1500, TNE can't add headers because already at 1500. What happens? A: host gets "too big" back. Q: if don't adjust? A: then not obeying rfc 1191. under sprite-mtu, ETRs must specify an EMTU_R of 2k. Any other topics? ... No. ---------------------------------------------------------------------- Routing Research Group Meeting minutes from IETF-70, Vancouver, Canada SESSION 2: 9:00-11:30 FRIDAY DECEMBER 7, 2007 Agenda: 9:00am Chairs Logistics, agenda bashing 9:10am Dino Farinacci, Dave Meyer, Eliot Lear, Vince Fuller, Scott Brim, Darrel Lewis: LISP proposal and prototype update, LISP CONS, LISP NERD, An EID-to-RLOC Mapping system, LISP EMACS, Interworking LISP with IPv4 and IPv6 10:40am Dan Jen, Michael Meisel: APT update DINO, VINCE, DARREL: LISP, LISP-ALT, LISP INTERWORKING Contents: Deltas from the Chicago RRG. Spec updates, protocols, interworking, prototype test and deployment. Adjust LISP subschedule: the 5 minute slots reduced to brief summaries to leave more time for discussions. DINO: LISP-02 VS LISP-05 - Added mobility section to the text. LISP is not optimized for fast mobility. Want to reduce rate of change to mapping database. - Negative mapping entries, to inform that a prefix is not handled by LISP. - Text about MTU. Result from a survey: most people say the inter-router links are at 4470, and jumbo framing supported by a lot of products --> don't put mechanism in ITRs for MTU issues. (not everyone agrees with the survey result) - UDP port number was 4342 for TCP control plane. Allocated 4341 for data plane. Not testing for data type (UDP etc.), just for port 4341. - data plane packet simplified. No LISP "data type" field. Locator reachability 12 -> 32 bits. Nonce reduced to 32 bits. - UDP checksums: ipv6 says must compute checksum. We allow checksum of 0 to be transmitted ... but ipv6 header has no checksum. Lars Eggert: use UDP-lite, designed for ipv6. But we want to use the same protocol for ipv6 and ipv4. Lars: we can sort this out. - R bit in Map-Replies, so loc-reach bits can be conveyed. Slide 10: Why so many mapping designs? if you build a network based solution, the main problem is that the packet comes to a box, if you don't have the mapping state, you can drop and "arp", or queue the data until you get a reply (not scalable), or use the data as a request. Lots tough questions about the mapping database designs. Keith Moore adds: Who controls it? who pays for it? Who manages? how do you keep it out of control of unsavory elements? Meyer & Dino: leave the policy issue until later? Keith: these things interact. i don't want to drill down into policy now, but you should be thinking about it now. CONS: not much changed. NERD: minor changes. LISP 1.5 (using data probes to query for a Map-Reply) is now LISP-ALT. LISP-EMACS based on multicast. Added triggered Map-Replies to main LISP spec. Put NERD and CONS on back burner, focus on ALT for now, ALT seems to have lots potential. Don't want locator set to change for prefix very often. This is why LISP isn't oriented toward fast mobility. It's a tradeoff. Same for locator set reachability. Don't want to depend on other security design or infrastructure. There is nothing both simple and deployable now. Prototyped a mobility design. this doesn't require host routes, Uses very specific state in correspondent node's ETR, stress scale on the CN's ETR. Solved ITR spoofing problem but not mapping authorization. Putting the design on the shelf now due to a number of unresolved issues. LISP is not a cisco-only project. cisco has no IPR. Open policy on LISP work. Solicit involvement: need implementors, interoperability, lots research and analysis. CONS review: when an itr gets a packet, it sends a request up a hierarchy to the authoritative edge, and the edge replies with a map reply. What we learned: We wanted to optimize aggregatability of EID prefixes, focused on controlling the rate*state product, however another dimension came up: latency. NERD review. A push thing, really cool but you might need some memory. NERD and storage requirements. VINCE FULLER: LISP-ALT Conceptually similar to CONS, operationally very different. CONS was the invention of a whole new set of protocols. ALT is built using familiar techniques, GRE and BGP. Option for data-triggered map-replies. Concern about DoS vectors and performance impact. LISP-ALT routers are interconnected by GRE tunnels to form ÒLogical Alternative TopologyÓ (LAT), with TRs connected at edge of LAT. Not predetermined who will manage the overlay infrastructure. ISPs, IXCs, RIRs, neutral parties? A new revenue source for infrastructure players? Run BGP for EID prefix propagation over LAT. LAT is not expected to be used for high capacity data flows, one could build it out of linux boxes. Assumption remains that EID space is assigned hierarchically. JARI: does that aggregation mean that you are bound to your identifier provider? ... deferred. CHRISTIAN VOGT: the best attachment point (of the mapping info source) is within a provider. A: but that attachment isn't a physical attachment...defer discussion to later. Showed build slide of how ALT works: when a packet reaches an ITR, say 240.0.0/1 -> 240.1.1.1 which are identifiers, ITR wraps the packet into a LISP packet, uses the destination EID in the OUTER header, then forward the packet to the first edge ALT router, which uses BGP to route the packet to the edge ETR that has specific prefix information for 240.1.1.1. ETR knows destination, unwraps and forwards the packet, and sends a map-reply. DINO CONTINUES Brief overview of LISP-EMACS: build alternate topology like LISP-ALT (BGP over GRE). Run PIM bidirectional shared tree. ETRs hash their EID prefixes to a multicast group ID and join the group. ITRs send EID mapping queries to the multicast group, ETRs who own the EIDs respond with map-reply over either LAT (allow caching) or direct topology (potentially faster, but loses caching benefit). Comparison between LISP-EMACS and LISP-ALT: EMACS advertises multicast tree roots using BGP over a GRE topology, while ALT advertises EID-prefixes. The number of routes in EMACS can be less than that in LISP-ALT, however packets will flow to more places than the intended router. An interesting idea, but raised concern about overloading sites joined to the same group. Prototype update: ALT was easy. No changes to GRE or BGP. Needed to change LISP code so when get packets in and don't have route, look in the other VRF. Focus on ALT for now. Testbed status: Added 240/4 support to use as EIDs. Added LISP 1.5/LISP-ALT support. Multiple EID-prefix testing completed. Multiple locator testing completed. Have a mailing list now, lisp-interest@lists.civil-tongue.net For subscribe, send an email to majordomo@lists.civil-tongue.net, put in the body: subscribe lisp-interest Next step: Deeper dive into ALT, think more about security, experiment with hybrid approaches ALT/NERD or ALT/CONS. Experiment with movement, aggregation, anti-entropy, and ipv4/v6 interworking (IPv6 EIDs over IPv4 Locators). External pilot in spring '08. Taking names. Jari Arkko: what purpose does this experimentation serve? We have a number of different alternative techniques. How can one find out whether one thing works better than something else? Need to talk about what are the things we want to test within RRG. Dave Thaler: when communicating from s LISP site, did you assume that the table was a complete table of all sites that had LISP enabled? A: yes. Thaler: in the test example there was no hierarchy, if you have hierarchy where the ITR doesn't have a complete table, what do you do in the prototype with a packet destined to a non-lisp site: drop, queue, or forward? A: haven't gotten that far, today's prototype assumes complete table. Kevin Fall: likes Jari's first question about EID structure--EIDs can be anything, including CGAs. Would there be EID providers that you'll then be attached to instead of the underlying IP address? A: it's imperative that the coupling be loose. If RIR to allocate EIDs, they will aggregate. We think aggregation is important Brim: LISP can do anything, doesn't depend on EID being an IP address, as long as you have someone else who wants to use it to talk to you and a mapping mechanism. Eliot: NERD doesn't aggregate at all. Worst case for ALT is the state you have for NERD. DARREL LEWIS, INTERWORKING 4 cases: non-LISP(NL)->NL, LISP->LISP, LISP->NL, NL->LISP Interworking: handle the last 2 cases. 2 types of LISP sites: prefix from routable space (LISP-R), routable EID prefix from non-routable space (LISP-NR). Routable EIDs published in both the existing BGP DFZ and the LISP mapping database, can only be withdrawn from DFZ after transition is ÔcompletedÕ. Enable initial LISP sites to transition, but doesn't scale. Proxy Tunnel Routers (PTR) advertise highly aggregated EID-prefixes, encapsulate traffic from non-LISP sites to RLOB of LISP sites. traffic is asymmetrical, return traffic does not go back through PTR. Fred Templin: isn't 6to4 relay router basically the same thing? A: yes. PTR doesn't proxy; it is a remote encapsulator. LISP-NAT: handles 2 cases (1)LISP-NR site -> NL, (2)LISP site with 1918 private EIDs -> other sites. OPEN MIKE, Q&A Christian Vogt: Are routable EIDs PI prefixes? How do you reduce the routing table? A: we'd have to make then NR sites and have them withdraw it, there's no magic here. Vogt: looks costly because you need to participate in two new topologies, the proxy topology and the LAT. You want a proxy tunnel close to legacy site and there is no motivation for that. A: PTR is in the infrastructure where you can take advantage of aggregation, and you want to have enough of them to distribute the load. Can start with one or two and all traffic will sync to that when it is advertised. Of course that has lots of stretch but one must start somewhere. Robib Whittle (thru Tony): How do PTRs differ from anycast in cores and IVIP? A: Operationally same, but tons of detail differences. Robin: How would a negative mapping entry work with PTRs? A: Negative cache entries are for non-LISP sites. If a packet is going to a PTR, it's targeted at something the PTR advertised, so negative cache doesn't apply. Darrel: but conceptually it might be interesting. Jari Arkko: We need to separate technical capabilities versus motivations for doing something. Why would I as an end site deploy a PTR? Darrel: the end site doesn't have to run BGP; PTR versus NAT. ... Business relationships still need work. Michael Meisel: At top of hierarchy, do they get heavyweight? A: Carry same or less load. Q: Both PTRs and high level ALT nodes, as number of LISP sites grows they need to do more work. Where's the economic motivation for running these things? What about at mid-level and top-level alt routers? A: encaps is easier than decaps, so it's good that PTRs encaps, and never decaps. Asked customers: it is easier to handle 100k routes than 10k interfaces. Let's err on having routes rather than hard state. Iljitsch van Beijnum: you first look ALT mapping and then regular routing table, why that order? A: because if there is a LISP site you really want to encapsulate. Think about default route. Q: what if part of the ALT topology splits off from itself? and the physical topology could still be reachable? in reality, there's enough meshiness and aggregation that prefixes don't go away in the core. Iljitsch: Why do you need UDP? A: because the ITR needs to modify the source port so that you can hash something to a prefix on equal cost Iljitsch: 3rd question ... take it to the list. Erik Nordmark: security and ALT: what prevents someone from accidentally aggregating someone else's prefix? A: you can have filters. Erik: you need a business relationship between the ETRs and the LAT. How do you have fanout? Dino: you're right- it's going to have a different topology fanout. Jari: how would filters work with PTRs? --> discuss on list. Steve Blake: how do you do ITR anti-spoofing, or ETR does not do reverse mapping? A: It can, but by default does not. Gleaning is known to be risky and won't be done if not certain. DAN JEN AND MICHAEL MEISEL: APT This will be a very comparison focused talk The big picture: APT and LISP are similar at a high level, both do map&encap, both need a mapping service design. Our design philosophy is "do no harm" while solving routing scalability problem: - Avoid packet loss, - minimize latency of mapping service, - align cost and performance (people who pay get the benefit). Diff significantly from LISP: - map info distribution; - handling of transient failures; - deployment scenarios: where to deploy and what motivate it. Different LISP mapping schemes differ from each other. Comparison is mainly with LISP-ALT. Terminology For the presentation borrow EID and RLOC from LISP, though EIDs are not identifiers, just addresses/prefixes. "MapSet": maps an EID prefix to the entire set of ETR RLOCs through which it can potentially be reached. Used by APT default mappers (DM). Used by TRs in LISP. "MapRec" maps to just one of its potential RLOCs. APT TRs caches MapRecs, not Mapsets. Where mapping info is stored: - TRs in both APT and LISP cache retrieved mapping info - LISP sites don't store entire mapping table, retrieve on demand via remote poll (from destination ETR) - APT: each AS has default mappers (DM), DM keeps entire mapping table. ITRs retrieve MapRec via a local poll (within the same AS) TRs are PEs of the ISP, delete unused MapRecs after some TTL. Dave Meyer: why have TRs at the ISP? A: see later in deployment motivation and scenarios. APT example (slide 12): for ITR cache misses, packets are sent to local DM which forwards packet and sends a cache-add to the ITR. DM controls the policy of which MapRec to send to which ITR. Handling transient failures: PE fails, CE fails, or PE-CE link fails. Must do 2 things: (1)handling packets in transit (heading to failed point), and (2) notify the ITRs. LISP has no mentioning about (1), has aggressive push for (2). APT reroute packets in transit to local DM which finds alternative path and forwards packet, and notifies DM in source AS about the failure. Dave Oran: But what if an ISP doesn't want to tell others that one of its ETR went down? A: we assume isps want to help get packets to customers Christian Vogt:Can you get ping pong effect if two routers go down? A: have mechanisms to resolve it. Now map dissemination: Use DM-BGP (Separate BGP session running on different TCP port). Only default mappers peer via DM-BGP. Diff from LISP-ALT, DM-BGP does not create a routable topology, but distributing mapping info only. Updates are signed cryptographically by the originator. Incremental deployment: ISPs benefit from a scalable routing system, they should deploy APT. The first ISP turning on APT becomes an APT island; neighboring islands can merge to form a larger one. Packets are encapped/decapped as they pass through the island. Detail skipped. Compare incentives between ISP vs end-site deployment. Partial deployment: APT: an ISP can move to APT unilaterally and gain benefit in reducing routing table size. LISP: first mover LISP end sites require PTR functionality--depend on the ISP to provide PTR? Mapping info retrieval by TRs: Local vs remote pull. APT: poll from local DM. Pay a cost to distribute mapping changes and in DM storage. LISP: poll from the EID owner through ALT hierarchy. Don't need to proactively distribute mapping changes. Minimal storage requirement. But the polling delay? The load on ALT routers? Empirical evaluation needed to compare the above tradeoffs. Mapping info retrieval: flat vs. hierarchical infrastructure APT essentially floods info to all DMs. ALT: Who host higher level ALT routers? It's an economic/trust thing. APT Q&A Dino: LISP not really a hierarchy, lots of mesh. If there are choke points there can be caching. A: yes. But caching loses source specific TE. Sue Hares: security slide: details are "complex". Are signatures verified in each node or is stuff added in each node in a distributed fashion? A: Trying to verify source of announcement, but don't care about path it took to get there. Sue Hares: Rapid movement of some major outage. How to handle? Databases pushed but how do you know nothing has changed at the source? Want mappings to be up-to-date. Maybe update daily? A: failure doesn't show up in mapping database; only permanent changes in edge connectivity send mapping updates. Erik Nordmark: Sees this working when everyone has full table, but how does it relate to aggregation? A: no aggregation for EIDs. Dan: met LISP team a week ago. ... there are differences with LISP but mostly in mapping service. Both are map&encaps. Plan on writing a LISP-APT draft as a step towards converge Eliot Lear: big differences. PI notion is very different in lisp, +/-'s to both. Need to explore. ETR in provider means provider desiring stickiness can play games. What if consumer? how does it look if you want to multihome? Know how it works in LISP but not APT yet. Dow Street: what's the unit for multihoming, a business, household, device? A: we're talking about "people who own prefixes". ... can have multihoming of both devices and sites, though full IP addresses (in the mapping table) won't scale in any design Chandra Arpanna: goal of exercise should be about end to end reachability. there are already devices out there to deal with service providers who aren't providing the right level of service. THE CHAIRS: aiming to meet on Friday at Philly. THE END!