Secure Inter-Domain Routing WG (sidr) IETF 80, Prague, Czech Republic Thurs, Mar 31, 2011 Times below are "wall clock" Steve Kent presented the algorithm agility draft. See slides. The document has gotten little review and needs more. No comments. Randy presented on three drafts: ghostbusters, origin ops, and rpki to router. No questions or comments. Mingui Zhang presented on DBGP-A, an attack mitigation scheme. Randy asked presenter to relate this to IDR draft on diagnostic signaling; deferred for later presentation of the other work. Yes does use more memory. Sandy asked why this is better than the memory used by history databases. Answer: similar amounts of memory, but this reuses same database. Keyur Patel, Randy, and Reudiger: if you pass this across ASes, won't this become an attack vector? A: It's optional. (!) John Scudder: does this require contiguous deployment? No answer. Shane Amate: this looks similar in principle to RFD, and suggests authors look into correlations between RFD and this effort. Doug Montgomery: no greater threat of spoofed routes than now, and someone will detect the bad route. Alvaro Retana presented on security state diagnostic messages. The idea is to push "I got an invalid route from you" messages downstream, and this may be useful during initial deployment. Randy: you certainly haven't left out any features. In partial deployment, an error sent to a sender who doesn't do this may not understand the error. Is this transitive? A: not transitive; one hop only. Randy: so if error is further downstream, can't send error to original source of problem? Ruediger: how will you present the relevant information? Are you expecting a UI to present this info? It is more useful to add the UI to the router detecting the problems rather than the receiver. Shane Amate: this is why operators deploy looking glasses. This looks like BGP growing to become something more generic (i.e. SMTP). Rob Shakir: I like this because there's an existing trust relationship between me and my neighbor. In-band signalling is good. Jakob Heitz: could this work in reverse? If I don't get one of these, can I assume route accepted? Jeff Haas: this burns state, right? Some sort of transient notifications would be good. John Scudder: this isn't about security; this is about telling your peer you filtered, so this shouldn't be limited to "filtered because of the RPKI". [Missed one comment from Robert Raszuk.] Ruediger: we want better ways of getting and displaying the information. Putting this in-band doesn't buy me much. Putting it in BMP looks like the right thing. Ed Kern: this isn't about accept/filter policy, it's about valid v. invalid, and I could still be accepting the invalid. Randy, to Russ: what do you feel about signaling across ASes the validity state, not policy? The ops document specifically says don't signal state across boundaries... Russ: My concern is that the state of the two ASes WRT RPKI will not be harmonious, and you'll add confusion. 10:09am Murray Kucherawy presented on IPv6 prefix discovery. See slides. Basic problem: RBLs break down in IPv6 -- address space is too large. Want to learn the size of the endpoint address delegation. How to convey that? WHOIS isn't standard and won't scale. Don't really want MTAs to learn to speak BGP. Question: if this the right venue? Randy: change rate of RBLs is faster than RPKI anticipates and might break RPKI distribution problem. Rob: I don't think this is a good fit for RPKI. I don't see why you can't continue with current technique using wildcards for /64's -- some growth. A: /64s will lead to some collateral damage. Jeff Haas: I agree with Rob; the RPKI is the wrong answer. Richard Barnes: [missed it]. Arturo Servin: don't use RPKI for this. Randy: could you expound on what you mean by allocation sizes? A: I want to know how much address the client has assigned, so I know how much to cut. Andy Newton: is there a draft? A: no. Andy: this isn't as scary as I thought it was, since it doesn't sound like you're trying to use the RPKI, but we at ARIN would appreciate your feedback on our RESTful WHOIS. Ruediger: [missed it]. Chris Morrow: BGP data is available through DNS from routeviews, FYI. 10:25am Terry Manderson presented on geo-rpki. The use case was said to be use for research. There were questions about the level of ganularity of the collected geoloc. There were several comments concerning privacy particularly wrt issues of geopriv. Andy: I don't want to repeat geopriv arguments here. 10:40am Started a series of presentations on implementations: Matthias Waehlisch: implementing rpki-rtr protocol in C, integrated in BIRD. Randy: this is the first public, open-source reference implementation of the router end. Andrew Chi presented on validator interop. Chris Morrow: is test suite available? A: repositories are online, but dynamic. 10:56am Presentations on tools continued: Rob Austein, Tim Bruijnzeels, Andrew Chi, Hannes Gredler, Doug Montgomery, Andy Newton, Arturo Servin, and George Michaelson. See slides for presentations. Questions to Hannes Gredler: Randy: Bellovin and I looked at alternatives to SSH -- we have enough implementers in the room to negotiate this. Ed Kern: clarify definition of mismatch? A: using different caches, they might be disagree. Rob: we don't need all of the weight of SSH: we need authentication and integrity, but not encryption/privacy, but MD5 might be too little. Questions to Doug Montgomery: is source available? A: it will be, no copyright. Richard Barnes: why new SRx protocol in place of RTR? A: that's different -- we're offloading validation. Ended at 11:49am Secure Inter-Domain Routing WG (sidr) IETF 80, Prague, Czech Republic Friday, April 1, 2011 CHAIR(s): Sandra Murphy and Chris Morrow AGENDA: 1) Administrivia 5 minutes - Mailing list: http://www.ietf.org/mail-archive/web/sidr/index.html - WG Resources: http://tools.ietf.org/wg/sidr/ - Minute taker: Benno Overeinder - Jabber Scribe: Wes George - Blue Sheets - Agenda Bashing Randy Bush: I believe that people do not follow the recharter discussion on the list. Sandy explains the new work: finished route origin (not sufficient), now path validation (protect the full path). Many of the subjects presented today are related to this new work. 2) New work discussion a) BGP Security Threats: Threat Model for BGP Path Security Steve Kent http://tools.ietf.org/html/draft-kent-bgpsec-threats-01 Steve: Comment to the audience. If list of adversaries not complete, please post your comments to the list. [slide 12] Shane Amante: In this example, how is AS122 advertising a more specific, is AS120 asserting a range, but ROA only specifies /16? Steve: Indeed, the ROA here specifies ROA /16 - /18 range, so /17 is valid. Forgot to update the slide. [slide 13] Peter Lothberg: Most network use MPLS to be under IP. If you want to be MITM, you can attack that plane and it is invisible. Ruediger Volk: Inter-domain MPLS is not that common in use... Yet. Steve: We are solving the problem that exists now. Randy Bush: This is not attempting to protect the data plane. (In IMC paper in 2009, 70% of the ASNs have default. Not trying to protect policy or intent. Cannot know these, as data does not follow the control plane.) This is intended to protect the protocol (see that the BGP protocol is not gamed). Shane: In this instance of dropping an AS, interesting because in practice if you think about AS sets, which we know are being deprecated, one could assert that this is a legitimate example of dropping an AS, because an AS-set is considered one AS in path length in the decision process. Point is that there are legitimate forms of aggregation (or there were), and therefore this is more of a malicious attack rather than a truly intended and legitimate. Sandy: So you are talking about the fact that the AS-set use would shorten the path length, not that it would eliminate it from the path. Question of Sandy to operators: MPLS normally not (often) being used to cross AS boundaries? BGP is intended to be used across AS boundaries. Gregory Cauchy (France Telecom): We do have many MPLS across AS for VPN businesses across well know/friendly ASs. Only peering with networks we have business with. The VPN business is not for interconnectivity, but having our Level 2 VPN more geographic connectivity. Rob Shakir (C&W): VPNv4, what you can manipulate is what you can do with a trusted speaker. Untrusted speaker in a single VPN is very different from untrusted speaker in the Internet. Peter Lothberg: Goal is to make the Internet work better. I am seeing lots of businesses where carrier is serving carrier. This is traded a common commodity. You may not know where your traffic is going. Randy: The charter is not about making the Internet better. Charter is to protect the BGP protocol. Ruediger Volk: What may be missing for scoping in the charter, we are only caring for public Internet. All of the VPN stuff that uses BGP is not taken care of because it uses a different NLRI. We can't do ROAs Shane Amante: To respond to Randy's point, the charter make implicitly making the Internet a bit better. It doesn't matter if one manipulates the path the traffic takes unless a) you don't have encryption or b) you care about the performance. It is making the Internet better for some definition of better. Steve: We have a goal of improving some aspects of the Internet, but if we were going to do that, Shakespeare would say: Kill all of the spammers! [slide 15] [Kapela/Pilosov] Geoff Huston: What's the difference between that and failure to withdraw? Steve: Replays and failures to withdraw are similar. Geoff: They cause the same thing, but they're not. Steve: I'm not sure I can see the difference. Geoff: It's still the same outcome, in the terms of analysis of threats... Sandy: Difference between withdraw and replay---in the between time I don't get traffic. Geoff: Failure to withdraw is trickier. Steve: Recommend sending an email message to add this to the list of attacks. [slide 17] Shane Amante: Two comments, Kapela/Pilosov only seen the presentation, but never a white paper describing in technical details. It should be described in technical term everybody could understand. Steve: Not all work is based on the Kapela/Pilosov, it is just another example. If we need a stable reference to this work, we should contact the authors. Shane: When I talk to people who participate and do not participate in the WG, this attack is asserted as the primary reason for this work in the WG. Second point. You've described a series of threats, the question is about the realness of these attacks. This was a demonstrated attack, what is not clear to the reader is the relative risk, percentage of times the attack has occurred. This type of attack has only been demonstrated once, never documented, but may be happening more widely Chris Morrow: There are large ISPs where this happens on daily bases. This has been observed by other ISPs of your size in Ashburn, VA. Shane: Frequency, probability, demonstrate that we're solving for the likely attacks. Steve: IETF doesn't do that because by the time it goes out there will be new attacks and we have to re-evaluate, and because other ISPs have said that they feel that they are not allowed to report other types of issues. But we do not develop protocols on the basis of likely-hood of these attacks. See for example ssl or TLS is not that interesting if I'm not using a wireless network, but we've adopted a model that says we're going to try to protect that path. Shane: Understand, concern is either boiling the ocean or fixing a subset of problems that give you good enough security for something that can be widely deployed. Sandy: One term that Steve did not put in his terminology was something about risk. How much damage can you expect and how frequently it occurs? This is very much "beauty is in the eye of the beholder". Brian Weis (Cisco): This is a nice set of threats, but do the other items have a list of the threats that they mitigate? Steven: Answer is yes Sandy: Put this question on the list. Sriram (NIST): Shane's question about aggregation. AS can create a ROA for the aggregate and sign as originating AS. Randy: Shane, I assure that these were seen in the wild. Those presentations will be present next time. No one has been able to report, but we were overjoyed to find them so that we could demonstrate that they're present. Warren Kumari: While we believe most of the misoriginations are accidental, it's hard to prove that. Rob Austein: Keep timescale in mind. The fact we only see a few now, doesn't mean they won't become a problem later. Set wayback machine 20 years: Theoretically you could do a TCP attack if you were on path. Now you can buy off the shelf for $99 that does this, it is called a NAT box. Sandy Murphy [hat on]: In your document you address attacks, are these intended to be a complete list of the attacks that the WG should address. Steve: These should be a comprehensive characterization of the attacks we care about not every possible combination, just classes. b) BGP Security Requirements: Security Requirements for BGP Path Validation Randy Bush http://tools.ietf.org/html/draft-ymbk-bgpsec-reqs-02 Going through the draft and highlight the main points. General requirements for the protocol are: Section 3.4 Crypto Payloads: Sign Paths If you do that, you're not likely to do it on a 7200 or M5 right now. We've just got the RPKI and origin validation to test images the edge routers will be moved in 5 years to less loaded parts of your network. IETF process to get this out the door will take 2-10 years. Edge routers lifetime is about 5 year lifetime before they get moved. This will be a race to see which is slower. This is not tomorrow's routing, but day after's. Section 3.6 4k limit to PDU was imposed by FDDI, will be moved to 64k -> big enough Section 3.10 Today, you can pack common updates together and fill the max PDU size. Problem is on the receiving router: applying policy. The signed package decomposes and the sig is no longer valid so you can unpack. Presented an experiment a couple of IEPG meetings ago, showing that worst case was 2:1 when packing was disabled. Section 3.12 Shane Amante: Backward Compatibility: Is this trying to speak to universal deployment of BGPSec or not. It's talking about backward compatibility. Randy: All it is saying is that with routers that are speaking BGPSec need not be compatible with routers that are not. If you're both BGPSec speaking routers, BGP doesn't have to be compatible with BGP as we know it today if you're not both BGPsec, it does. Rob Shakir: If you are changing BGP, why use BGP at all Randy : Answers with allegory, but in essence because it looks like the beast. Geoff: Now I understand why I'm confused. Issues about incremental deployment versus backward compatibility. What you want is a distinction between the way info is exchanged and the objects you are exchanging. If we are speaking some hinky thing, I have to translate or drop stuff so that it becomes BGP4 as we knew. The other thing that is going on is that secret squirrel stuff that you and I have, if I send it onward as BGP4, no one else can put more secret-squirrel stuff on there. when we did BGP ASN 4 byte, we allowed the secret-squirrel stuff to go between non-secret squirrels. we called that incremental deployment. This doesn't do that. Randy: This document intentionally finesses the question of whether trust information transits non-trust-speaking oceans between islands. In the actual BGPSec document proposal, the answer is that it doesn't. This is a matter of discussion. The trust/security people are fairly adamant that it shouldn't, and I trust them. Geoff: In observing the transition IPv4-IPv6 and AS-AS4, I understand what the security folk want, and appreciate it. But there is another path of practical deployment in a world of money.... What you just said is important for everyone in this room. Eventually getting there and how we get there is more important than what happens at the end. Geoff is emphasising this is extremely important to understand. It is a critical part of this work. Question about Sharon's paper. Trying to understand critical mass of deployment is one thing when incremental deployment happens as a wavefront rather than a disease. Big debate over partial path signing. And there are people in this room that are strong proponents of this. Randy says he is not smart enough to understand the implications. John Scudder: Was trying to understand if Geoff was proposing something different should be in the requirements doc, or if that was a tangent. Geoff: I'm not smart enough to understand the implications. Requirement assumes knowledge of an outcome. Partial deployment is great, but weird in the context of path. Rob Austein: To be clear, it's not lack of interest in partial path signing. Whenever we looked at it, it started looking like that famous cartoon... Step 2, "and then a miracle occurs" and it was hard to bridge that gap. Section 3.15 soBGP vs SBGP Problem [with soBGP] is that providers does not want to publish the data needed for soBGP, and highly volatile (changes every 30 seconds). Section 3.20 non-transitive Is important. Section 4.7 Validation state only - policy will tell each router what to do Brian Weis: Time dependency of these future routers. What are the restrictions or allowances on BGPSec in relying on external devices . Hard dependence on time. Most routers use NTP Sandy: Please send this to list. Rob Shakir: Consider AS-override. It probably is a matter of saying "not in scope, doesn't handle". If you are trying to say that the AS-path is true, then all of the hacks we have to lie about the AS-path are sort of a problem (which are currently legitimate tricks). Randy: This is the life of security! Shane: Removing private ASN widely deployed mechanism within providers. Not stated if this is within or out of scope. Other issue: I am confused. If you look at 3.4, discussion says this may require new hardware. Check email exchange between Shane and Chris on the sidr email list. Chris answers: This was for Route Origin. Shane: Does it need to run on the control plane? Section 3.4 seems to suggest that this is designed to operate strictly in the control plane. Answer of Chris: You do this over BGP for path security. Andrei Robachevsky (ISOC): Didn't find anything about scale/convergence in this requirement document. Wonder if this belongs here. Chris Morrow: Already very important. Andrei: But not in the document Chris answers: The impact has been a critical topic. Send this to the list. Peter Lothberg: Getting this into provider networks. But the people who have the biggest budget are the ones that make more money charging for VPNs and such. How do we make this attractive for them too? Chris: Has not really being discussed. Peter: More generic model. If you want a secure BGP, please look at all applications of BGP and make sure that it is applicable. Randy: Secure INTERDOMAIN routing. Wes George [scribe]: You can have inter-domain VPNs. d) An Overview of BGPSEC Matt Lepinski http://tools.ietf.org/html/draft-lepinski-bgpsec-overview-00 Two drafts of Lepinski, overview and protocol, are a straw-man proposal. It is a first cut to achieve the requirements Randy describe, and meet the new charter deliverable. The focus of these documents is to have a simple and understandable protocol with correct semantics to talk about. [slide 3] Ilya Varlashkin: What if an AS forwards an advertisement which was not selected as their best path. Matt: This is a non-goal of the current documents. But if you have suggestions to solve this, would love to hear this. Randy: Says this will work for add path and diverse path. [slide 4] Jeff Haas: No different than being able to verify that any path you're receiving is being used for forwarding. BGP may have been lagging behind its advertised state. May have selected, but have not made it through the queue. You don't want to try to couple this too tightly together. [slide 5] "negotiating bgpsec" Sending is much easier than receiving... Shane Amante: SHA256 is being used here? Matt answers: This must be discussed about what algorithm is being used. Crypto was chosen because that's what we're using in RPKI, but we can discuss as separate items changing algorithms is hard, so we should talk more about the right one Shane: Have you done studies on what the theoretical size of a routing table would look like with all of these signatures? Sriram did some computation, this will be taken to the list. [slide 6] Wes George: Question about partial/incremental deployment with routers who do and don't speak BGPSec. Answer: With iBPG, the route reflector needs to speak BGPSec. This will be included in the document. [slide 9] The crux of the proposal, what data is signed. [slide 14] Geoff: There is a strange twist in this. Hypothesis is that in the current version, you accepts routes and do validation in your framework. And I receive updates and accept your judgment and process incremental. With trust anchors, I can use a different set of you, and come to a different result using a slightly different trust anchors. Randy: Your hypothesis is close. It is not only trust anchors. The RPKI is a distributed database, and is not necessary consistent between different parties. So different ASes can have different perspectives. I choose what to propagate to you, you receive it and base your actions on your evaluation. Brian Weis: You should not be using lots of routes that you think are garbage. Matt: Protocol does not say you must only sign things that are validated. Randy: Put another way: when I sign and send, I am attesting that this is what I received. Matt: Indeed, and I am also attesting that I chose for policy reasons to send to my neighbor. [slide 20] Rob Shakir: Operational reports mailed out is how many updates on the Internet each week. Analysis on BGP updates and how it scales, RFC 4274? Matt: Don't have good advice on how to set expire time, is up to discussion (maybe days if it's stable). Maybe adding 1-2 updates for stable routing every day or two. Rob: Problem isn't how many per prefix, but how many prefixes there are. Understand new hardware may be required. Large changes in how scaling and such are managed. Matt: Indeed this needs to be discussed Jeff Haas: You can make recommendations to implementers of BGP if it's an update that has no changes but timing, it can be a refresh, not a new update. Matt: Thanks, send to the list. Jeff: 2nd observation, might help with path hunting algorithms. Expiry times that are too short, are an attack on systems on the Internet. Recommend keys that require refresh that they be done in a way that they can be updated as a part of an already pending update. John Scudder: Report about BGP updates per day, amusing, but not necessarily useful because it's an average. Most interesting in BGP implementation is how it processes peak load. Hopefully implementations are smart enough to manage this. Paul Hofman: Why not in hours instead of seconds. Matt: Good question, we could include some mechanisms to deal with this. Something to discuss here on the list. Warren Kumari: A remark: there is stuff that is changing, but there are benefits too. [slide 23] Final notes Wes George: In this implementation, the same statuses as with route origin validation? Matt: No, valid/invalid. But have to consider situation with not signed. Wes George: Question with respect to time expiration. Maybe I want to deal with cases, where from a policy perspective I might want to apply a different policy if something is expired, versus something verified and invalid, and no information available. Matt: It is a bit stronger if you have a path with expired correct signatures, than no signed path at all. Rob Austein: And you have to calibrate what is expired. Wes: Using graduation of days/weeks/months expired? make sense? statuses ? do we need one for expired instead of just invalid? Jeff: Think about the fact that this infrastructure is somewhat reusable. We are not signing the entire packet, we are signing an attribute. Matt: Design goal was to sign as little as possible and still get good security semantics. Jeff: Majority of the requirements for VPN and other NLRI is a means to sign so keep this in mind for reuse. Not necessarily design for it as a primary requirement. Rob Austein: In principle agree. Difficulty is that signing is not a solution. Signatures have to be made with keys that are associated with something that has the right property. It may not necessarily tell you anything. Must analyse to see what attestation you can make based on the data: what is the trust model. Doug Montgomery: Gaps about receive model and fetching keys. Assumptions that they will be preloaded and stored by validating cache. Matt: Fetch is a better word. You will fetch from a local store. Wes George: You mentioned that BGPSec should not have a negative impact on BGP performance. Give us best estimate of scaling and convergence impacts (table size, computation time, ...). Doug: Explains how it could work with the implementation/protocol. Given how the protocol is specified, this option is still open. Jeff Haas: Comment on convergence. Generally not well-understood what the inputs are for single-system convergence, let alone multisystem. So what is the impact on convergence? Wes George: Agree with comments from Jeff. Not the broad overall world-wide convergence, that is too much to ask, but the milliseconds the processing cycle takes. f) General Discussion Sandy: We used all the time, we had much discussion, and might continue, as there are no other sessions going on. Randy Bush: Request adoption of the documents as WG documents. Sandy: Question of adoption of the drafts taken to the list. Not necessary agree what is written in the documents, but do you agree the WG should work on. Randy: The BGPSec design group [on the slide] has plans to continue to work on this, but Matt used "you" in his presentation, so "you", the WG participants should be/must be get involved in this.