Minutes taken by Brian Weis. WG Status - Paul Hoffman (PH) Slides: http://www.ietf.org/proceedings/79/slides/ipsecme-0.pdf PH: The two active documents in the WG will be presented today. We assume that people have read the docs and are aware of the open issues (http://trac.tools.ietf.org/wg/ipsecme/trac/report/1). Failure Detection - Frederick Detienne (FD) Slides: http://www.ietf.org/proceedings/79/slides/ipsecme-2.pdf FD: By taking parameters on the wire and the QCD Secret (not shared), a QCD token is created. The Token is sent in the IKE_AUTH exchange. The peer also returns a QCD token that it has similarly generated. If one of the VPN gateways crashes, it loses its information except for its QCD Secret. It will receive ESP packets from the still-live peer, and returns an INVALID_SPI as well as a token sent in the clear. The non-crashed peer sends an IKE message back containing just a header (liveness test), which allows the crashed peer to re-create its token. Q: Header isn't protected? A: This is the gist of the issue we'll be covering later. Issue 198: Do we really need the QCD token for the initiator too? Tero Kivinen (TK): I don't believe that to be relevant to this case. QCD is useful for the asymmetric case. Otherwise, the gateway that restarts can recreate the SA. Paul: we included the gw-gw case in the problem statement. TK: Making it symmetrical adds lots of complication to implementations, in particular when there's mobility involved. FD: On the contrary. Going asymmetric forces users to configure which side is doing QCD. In practice, this is a real problem ... I have seen many customer cases over the years and this is a big problem. Failure detection turns a high severity problem into a low-severity or a non-problem. Paul: Tero, what you're saying is that because this doesn't happen much, and it's a fair amount of implementation overhead, you'd like to remove this case? TK: Yes. Most of our gw-gw cases there are better and faster recovery methods. PH: But Fred, you're saying that if you don't cover this, then you have to configure more. FD: Yes, you have to make a judgment call re: which side initiates. TK: In those cases where it matters, only one one side can do QCD anyway. PH: You're way oversimplifying. TK: It's usually the branch office that drops off, and the headend doesn't care. You can use other methods that are faster to fix this problem. Most problems are a result of bad implementations. Paul: Are we at the right scenarios? FD: no vendor has a perfect implementation. Failure detection is a safety net. Yoav Nir (relayed by Jabber scribe): My view may be colored by my own implementation, but we use the same implementation for site-to-site or remote access GW. It's easier to implement both sides as both maker and taker. In any case, the original initiator isn't necessary the token maker. PH: Tero, for this item, say "which scenarios you feel are important" rather than it's not good for some cases. Tero: I have sent 100s of lines of text earlier, and I have explained that. My implementation can do this using existing IKEv2 methods and it might be one half-trip faster. FD: we agree either side can rebuild the tunnel. But when you're traffic triggered (which is common), the traffic pattern may not be symmetric. TK: and I have explained how this can be dealt with using standard IKEv2 methods. PH: Tero, when you repost, specify which scenarios you are dealing with. FD: we should say in the doc that we want a “one size fit all scenario”, where end users don't need to be involved in configuring it. The issue remains open. We need to analyze it per use case. Issue 199: Section 7.4 is mostly wrong. TK: I don't think we need the whole section 7 any more. PH: It would be important to show people 5 years down the road why we did this. Issue 200: Section 8 ignores IKEv2 text. FD: the picture is also incorrect. Issue 201: Interdomain Gateways do not need QCD at all FD: this should be merged with 198. We should have a “one size fit all” solution, and independent of the traffic going inside the tunnel. Issue 202: Token makers generating the same tokens w/out sync DB TK: (Slide 15) MITM is much more powerful attack. This only requires a passive listener. The attacker doesn't even need to capture the gateway's reply. FD: MITM is wrong term, I agree. But even this type of attacker can do more dangerous things (TK disagrees, FD promises to expand on the mailing list). Dan Harkins: I don't see why this is a problem. It's a case with a cluster of nodes in standby. This is a stateful protocol, they're sharing state, there's no need to do QCD? And if you're not sharing state, why share the QCD token? FD: You're right, if they share state there's no need for QCD. The thing we see more and more is that synchronizing the SAs between gateways in hw appliances is expensive. DH: My point is that if you're not sharing state, don't share the QCD token. It's causing problems by sharing it. TK: They assume that the client will cache them and they can recover, using session resumption. Pratima Sethi: Sharing the QCD token helps you recover faster without synchronization. Faster than rebooting. PH: I'll restate (slide 14): Active/Standby are only sharing QCD secrets because its easier than sharing the whole state. Not the traditional HA, a QCD sharing scenario. FD: we offer a tradeoff compared to the universal failure detection provided by the liveness test. Speed of recovery vs. false positives. Yoav: It makes sense for a hot standby case without sharing, you want the failure to look like a really fast reboot. We should prevent the standby gateway from replying. It's dangerous for load-sharing. Gregory Lebowitz: It's not just dangerous, the load sharing case simply won't work. They'd have separate IP addresses. TK: the document already says that all cluster members should know whether an IKE SA is active. Other solutions don't work. FD: gateways even don't know if another gateway is up. Ahmad Muhanna: does a gateway know if it's in standby mode? FD: not in all scenarios. Not in anycast scenarios. PH: this contradicts the current text in the document. Some discussion was had about what really is active-standby definition. Yaron Sheffer (relayed by Jabber scribe): Should not reopen the IKE threat model. IKE is resistant to active MITM. Steve Kent: If it's really a standby it doesn't get traffic. Let's use the correct terminology. FD: the device is “standby” with respect to that particular peer. Maybe change the terminology. Devices may not know if they are active for a specific peer at any given time. Proposes a solution: the client should not accept a QCD token where its state machine indicates none should be coming. TK: an attacker can modify an (unauthenticated) ESP packet to cause the gateway to eventually respond with a token. FD: this requires a liveness test that will take care of the situation. PH: we should decide first whether we want to expand the doc's coverage. We should split #202 into 2-3 issues. High availability protocol open issues - 45 min - Dacheng Zhang After an HA event, the new active node might not have the most recent information (e.g., IPsec replay counters). PH: This HA proposal is only "tight" IPsec clusters, unlike the discussion about QCD. (slide 4) One solution is for the new active standby to request the newest information from the peer using an IKE notify. Delta in replay counter is sent, not the new value. Steve Kent: Pick the largest one, and send that in response? Paul: Yes, although might need to be more explicit. [Later note: this part was misunderstood by several people during the discussion – YS]. IKEv2 peers also need to negotiate the ability. Issue 1: Multiple failover Issue 2: How to synchronize the failover counter amongst different cluster members PH: These issues are about when there are 3 or more, and 2 go down. TK: I noticed that on slide 8 there is a problem, the ESN bit is overlapping the Critical bit. Needs to move elsewhere. Paul: Those who read the draft, did we close out the known issues from the last drafts? Does it match the req'ts document? Does anyone feel that we're not? (no response) Does anyone have any open issues? (None) We might go to WGLC on this sooner than QCD. FD: make sure ESP packets are not accepted while there is a replay “window of opportunity”. PH: this should at least go into the security considerations. IKEv2 Re-authentication PH: Keith Welter wanted this to be discussed and he's not here. There are some reauth issues with IKEv2bis, and there are some possible solutions. We might look at different reauth methods in the future, in a faster way. TK: I think we talked about this, we have an RFC out there talking about reauth. Are we coming back to that? PH: We said last time we don't want to over-complicate the base IKEv2 spec. TK: Adds complexity even if a separate RFC. PH: Now he's asking, is the problem big enough and we can do something less complicated? I'm not a big believer in this problem: if you do reauth, it's going to take some time. You don't want to over-optimize. But other people differ in opinion. Yoav: the idea is to keep the SAs up during reauth. IKEv2-- (IKEv2 "minus minus") Tero Kivinen Defining the minimal set of features that an IKE implementation needs to have. Lots of people discussing that IKEv2 is too complicated, other IETF groups (e.g. CORE) wanting to use IPsec but not the full IKEv2 functionality. They have constrained devices. There's lots of optional stuff in IKEv2. Explain that you only need 4 packets if you only need one SA. I started to take the IKEv2 docs and cut out stuff, and I ended up with 6-7 pages base spec plus 20 pages of payload description. All MUSTs except the one requiring support of certificates. PH: If you're willing to push together a rush draft, there's a couple of groups that might be interested in that. This will probably not be a WG document. Sean Turner: If you copy-and-paste the stuff out of the IKEv2 draft it might end up as pain due to errata having to be duplicated. You want to keep a pointer to the original. Also there's potential copyright issues. TK: I also added some new text. FD: even in IKEv1 we are seeing too minimal implementations. So this is useful as an introduction. The doc should be seen as a stepping stone towards a full implementation. Yaron: if this goes standards track, we will have a hard time synchronizing with the base spec. PH: informational. There was a discussion that this should be a profile kind of document, Informational, describing what would minimally be needed to implement IKEv2. The only issue is certificate support being a MUST. Yoav: this finally documents the mythical RFC 5996 “minimal implementation”. IKEv2 with CGA Jean-MIchel Combes Slides: http://www.ietf.org/proceedings/79/slides/ipsecme-3.pdf PH: Are any of these drawbacks different than CGA in general? J-M: Some are specific to IPsec. PH: What about the hard-coded algorithms? Steve Kent; the CSI WG talked about hard-coded algorithms but never did anything about them. The threat model is different. CGAs were intended to be "very local" -- used between a host and 1st hop gateway. When you move it here there's a different set of concerns. CGAs represent a "here's another way of doing autoconfig. and have a certain continuity" but it's for local uses. One of the features of IPv6 addresses were address generation for privacy reasons, and using multiple addresses concurrently. If you talk about CGAs as a static thing and beyond that local net, we're violating that other feature's trust model and that's a bad thing. J-M: there is a a MIP6 RFC (RFC 4866) on using CGAs to secure MIP6 signaling. (Ahmad: route optimization, not general MIP6). PH: This is for information now, and you'll let us know if you plan to progress it? J-M: Yes. Sean: apologizes for having held the PAKE proposals. We have 3 proposals. Will have an independent reviewer go through them in the next few weeks and will then decide how to proceed. Paul: please re-read the active drafts. We are still making protocol changes.