> Final Agenda - TEAS Virtual Interim Meeting (RSVP Ingress Protection / Egress Protection) > Jan 28th, 2016, 10:00 EST | 15:00 UTC (Duration: 90 mins) > Drafts: > https://datatracker.ietf.org/doc/draft-ietf-teas-rsvp-ingress-protection/ > https://datatracker.ietf.org/doc/draft-ietf-teas-rsvp-egress-protection/ > > - 10 min - Intro/Agenda (Chairs) > https://www.ietf.org/proceedings/interim/2016/01/28/teas/slides/slides-interim-2016-teas-1-1.pdf > Lou Berger: we want to be clear that implementation status is requested by the IESG, so we'll need to talk about that. It's important that we can demonstrate interest and implementation to get a standards-track document through the IESG. > - 65 min - RSVP-Ingress Protection > - 35 min - Solution Proposals: Comparative Analysis (Huaimo Chen) > https://www.ietf.org/proceedings/interim/2016/01/28/teas/slides/slides-interim-2016-teas-1-0.pdf > https://www.ietf.org/proceedings/interim/2016/01/28/teas/slides/slides-interim-2016-teas-1-0.pptx Huaimo presents: Greg asks: Why do you say this mechanism is faster than e2e protection? Would you agree detection time is the same? Huaimo: Yes. Greg Mirsky: Question on claim that speed of recovery drives recovery Huaimo Chen: two things to bear in mind. One is detection time, and a later slide will give the whole picture. In addition to detecing the failure at ingress, also need failure detection at the CE downstream. Greg Mirsky: alternative solution has been defined in linear protectiuon and it solves this exact problem Greg Mirsky: with detection and switchover, speed is not a real factor Huaimo Chen: Lou Berger: Greg brings up another point, at least indirectly. in addition to having an existing solution that provides a solution at perhaps the same or different speed, it's important that we not define another solution if we already have one that's good enough Huaimo Chen: speed is one effect, but also need to consider speed and operabiity Ross Callon: has a question about function. If want to cover any failure along the path, end to end solves the general problem, but we may want to go through the sides first Lou Berger: I'd like to hear that discussion and it's fine to defer it to the open discussion, but we need to cover it. Ravi Torvi: Slide 5: On the semantics of the path message... you're saying it's just like any other Path mesage, but it seems to be completely different. Can you comment? Huaimo Chen: it's different - it has an egress protection object Ravi: slide doesn't say there are three steps. You have a totally different state machine on the backup. Huaimo Chen: have aslide with details later - we can discuss there. Tarek Saad: (at slide 6) is there an implicit assumption that the primary and backup ingress are connected back-to-back with a link? Can they be multiple hops apart? Huaimo Chen: yes, but need a tunnel to be one RSVP hop away Tarek Saad: are we re-using the same RSVP sesssion for the primary to relay messages for the backup? Huaimo Chen: yes Tarek Saad: so the backup ingress will behave as an LER in that case Huaimo Chen: yes Tarek Saad: thanks - that's not been clear in the past. Lou Berger: (slide 7) so to be clear on what you said earlier, even though the backup terminates on P1 on the slide it's actually the same egress as the primary? Huaimo: yes Tarek Saad: slide 9: for completeness on the relay method, if the backup is multiple hops away I'd need to configure another tunnel, right? Huaimo Chen: yes Tarek Saad: so that's another bullet you need Huaimo Chen: yes, good catch. (slide 12) Pavan Beeram: Can you elaborate on how the sync happens between primary ingress and backup ingress? Huaimo Chen: If something changes in the primary LSP it can be duplicated to the backup. Pavan Beeram: So in the case of the relay message you need to figure out additonal ways to maintain the state sync Huaimo Chen: Any changes in the primary can be duplicated to the backup Pavan Beeram: So you're taling about re-using the same Path message used on the primary and redirecting it elsewhere. Any other changes? No change to the ERO in the message from primary ingress to backup ingress? Huaimo Chen: EROs are copied across Pavan Beeram: So all object processing now needs special semantics Huaimo Chen: No, it's just copied over so nothing special Lou Berger: It sounds like this isn't a normal Path message - more a notification to the backup that it needs to provision something. To me it would make sense fo this to be a Notify rather than a Path Huaimo Chen: the Path from primary ingress to backup ingress is a special Path message; we'll check cross-connections Pavan Beeram: early versions of the draft had a different message, I think Ravi Torvi: initial versions defined a new message type; later it was changed to a Path. I don't remember why. Pavan Beeram: In the current form we've changed the semantics of all object processing. Conventional ERO processing will fail as the backup ingress isn't on the path. Huaimo Chen: After some thought we decided using an existing message was easier. Pavan Beeram: OK, it seems clear that there's some work to be done for state sync and the relay message between primary ingress and backup ingress. (slide 16) Pavan Beeram: For the relay message, is it a separate session to the backup or a different one? Huaimo Chen: Same, but different state. Relay message has extra state for Path and Resv messages to backup ingress. So for proxy-ingress method we need to keep the state for the Resv messages received from backup egress. Tarek Saad: On relay message point 3 (Resv coming back from primary): does it have a purpose besides being an ack? Huiamo Chen: It relays protection state to the primary ingress node. Tarek Saad: Thanks. And about the proxy-ingress - you talk about two sessions - why? Huaimo Chen: Better to talk about state. So there's two Path messages and two Resv messages. Tarek Saad: So how many sessions? Huaimo Chen: At high level, session for the relay message is the same. For proxy-ingress all sessions are the same, so just one. But if we want to compare from the scalability point of view, number of states is related to number of messages as we need to keep state for each message. > - 25 min - Open Discussion Ravi Torvi: would you consider tweaking the session handling after this discussion? i.e. it's not really a Path message, but something that needs special handling Huaimo Chen: yes, it needs special handling Ravi Torvi: Pavan mentioned that objects have a different interpretation Huaimo Chen: yes, the ingress protection object is different and may contain labels. Labels for proxy-ingress are carried by Resv message. Greg Mirsky: I wanted to come back to the earlier question about detection. I recall the document for ingress protection has an OAM session to detect that the primary ingress node fails from the backup ingress. Is that right? So this requires an OAM session between PE6 and PE5, so the backup monitors the primary ingress, right? Huaimo Chen: That would be nice to have. If a source delivers traffic to PE5 first and then switches it to PE6, on PE6 the forwarding entry there is active at the beginning. PE6 will forward traffic to the bypass LSP so traffic continues in the data plane. We also have to keep the primary LSP up by refreshing Path messages; this is achieved by detecting the failure of PE5, and after that we have to put Path messages into the tunnel and into the next hop (P1). We use can OAM, or a faster method is better. We can use a number of ways to detect failure but faster ones are better. Greg Mirsky: So what method are you proposing? I don't see how routing can identify that the node isn't functioning. e.g. if you use OSPF and expect that your LSA will age out, that takes 30 minutes. Huaimo Chen: Yes, that's too long. We can check routes in the routing table, so if you have no routes to PE5 you can say that it's gone. John Drake: You could wait for a sufficient period of time and you could verify whether individual links to the node have gone down or all of them, but that's a long time. Huaimo Chen: As soon as one link goes down we can check the rest John Drake: But that's not rapid detection Tarek Saad: Do we need to determine whether the node has failed? If I detect at the source that my link to PE5 has gone isn't that enough? Can't you rely on the link to PE5 going down Greg Mirsky: if PE6 detects that the link to PE5 is down, that doesn't mean the link from PE5 to P1 is down Tarek Saad: but a source behind both could check the liveness of PE5 Greg Mirsky: what the source checks isn't the liveness of PE5, but the liveness of its connection to PE5. Tarek Saad: OK Greg Mirsky: so PE5 can be fine as far as P1 is concerned. Tarek Saad: yes, that's the case in transport protection Greg Mirsky: So if we lose the link between the source and PE5 and switch over, PE5 doesn't know that this has happened Ravi Torvi: Upstream source detecting the failure is one model; the backup has to protect the primary going down. Greg Mirsky: My point is that this method is no better than multi-homing Ross Callon: to me this discussion needs to be clear in the spec. What are we protecting against, and not protecting against? How do we distinguish link down vs node down vs multiple links down? Lou Berger: discussion today isn't about whether this is complete - I hope we all agree there's a lot of work to be done. Question is which of these approaches should we pursue in the WG? One thing that comes to mind is implementation - do we have people iterested in implementing one or both models? Bearing in mind we're on the record. Huaimo Chen: Huawei were interested in implementing the relay method for ingress protection, and had a prototype for egress protection. Lou Berger:: Anyone else want to comment? Greg seems interested in linear protection? Ravi Torvi: We (Juniper) don't see any customer interest in ingress protection given the complexity and issues with failure detection, so we want to come up with a simple solution and decide whether we really need this. Lou Berger:: so you don't like the relay message and don't think it's consistent with rest of RSVP? Ravi Torvi: yes Ross Callon: on simplicity vs complexity - this seems a lot more complex than end-to-end protection and only solves part of the problem. Huaimo Chen: here the complexity is in the vendor-side, but we provide simplicity to the provider. Lou Berger:: So this solution focuses on a narrow piece of protection; it doesn't worry about the end-to-end problem of transit problem. Clearly some authors believe this is a problem that needs to be solved. Anyone want to talk about htat? Is an optimised ingress protection solution something that people think is important? Ravi Torvi: We don't see a need to solve this. Lou Berger:: So you thing end-to-end or other methods are good enough? Or this just isn't a problem? Ravi Torvi: It's just not a problem. Huaimo Chen: In the beginning we saw issues in real deployments and presented these at previous IETFs with early draft versions. End-to-end protection for P2MP LSPs is really complicated - this is easier. But the motivation for this all comes from real deployments. Advantages are that if we protect ingress and egress nodes we have a whole solution, so it's fast, easy and efficient. Ravi Torvi: Ross also mentioned that reliably detecting primary failure is something we've gone through before - we need a lot more than a BFD session. That leads to deployment issues and so this isn't just a RSVP extension - there's a lot of deployment issues too. Lou Berger:: it seems from the folks here that there's some agreement that there's problems to solve and multiple ways to solve it. There are concerns about the original proposal which came to the point that there's now a second one in the doc. And there's agreement that to finalize this we need more details in the document - it's been thought about but it needs to be documented properly. So there's three options: carry on trying to make a standard, say there's not enough support, or the third option is to say we've done some good work but we're not sure how operationally viable it is, so we should run experiments - and we could have a simgle experimental draft with mutliple soutions. I'd be interested in hearing from folks on this. And I'd like to specifically ask the MPLS chairs their opinions as this work started in that WG. Loa Andersson: yes, this started in MPLS and went to TEAS. I think that if I view this from Huaimo's point of view we're asking him to redo what he's done before, so I don't know where this is going. If we say that we want to do an experiment... Huawei have done that. So I don't know what there is to add. Lou Berger: Linear protection has been standardized and can solve this problem, and that wasn't the case when this was first adopted in MPLS. So we had a real problem, but it's now solved by other means. Loa Andersson: I'll need to look into that more to say that with confidence. Ross Callon: Not sure I can speak as MPLS chair as there's no consensus in the WG, but my own view... there are times in the IETF where we take on work and we don't know how it'll play out until people do a lot of work (e.g. LISP). So in this case a lot of work has been put in to determine that something doesn't look as optimistic now as when we started. Publication as experimental is what we usually do in this situation. There's no alternative to doing the work and seeing how it turns out. I don't feel good about asking someone to do all this work and then not being able to publish it. Huaimo Chen: Even though this draft moved to TEAS, I think there was good support in WG meetings. Lou Berger:: anyone else have comments? Huaimo Chen: Also, implementation of this is not that hard. Pavan Beeram: so even if we go the experimental path, the draft is nowhere near complete, right? There's more details that need to be put in. Huaimo Chen: I've also talked to service providers who like this and think it increses scalability. George: if you're protecting end-to-end you don't need to protect hop by hop, and if you protect hop by hop you have a lot of state Lou Berger: issue is that end to end starts from one LER, and this starts from two. Huang Lu: In our network we have a lot of enterprise customers so local protection is very useful to us; if we have ingress + egress protection and FRR, that's useful. Lou Berger:: So existing protection mechanisms are too slow? Huang Lu: yes Lou Berger:: And you care about protecting ingress and egress nodes? Huang Lu: yes Lou Berger:: Do you care about what specific solution solves the ingress/egress problem? Do you care about the implementation, or just what it does? Huang Lu: I care about fast protection Lou Berger: So I think we've heard enough to say that people care about the problem, but we don't have consensus on the mechanism and we don't have consensus to throw out one of the options, and we don't have support to push for a proposed standards. I'd like to talk to Pavan offline and then bring proposals to the list - we can't make a decision here. Pavan Beeram: yes, we'll discuss between chairs and get back to the WG. We'd abandon the work if there were no interest, which is a bit extreme. If we heard that there was an immediate need for a solution we'd ask to continue the work as a standard. And we could go with the experimental approach. We'll use the list to get more details on implementations and interest in this. Lou Berger: we can separate the ingress and egress discussions; we should continue the egress discussion on the list (I have a question I want to ask about) Adrian Farrel: we keep skirting around what this problem is and we haven't really nailed it down. George asked if you're doing FRR at every point along the path. Maybe we need a 1-page disposable document that sets out what the problem is and what the potential solutions are so that folks can decide whether or not they're interested. Lou Berger:: Can you help guide that? Adrian Farrel: yes Chairs: Thanks to all for coming; we'll discuss further on the list. Special thanks to Huaimo for putting the slides together. > - 05 min - Discussion Summary / Next-Steps (Chairs) > > - 15 min - RSVP-Egress Protection (Open Discussion) > > Meeting Materials - > https://www.ietf.org/proceedings/interim/2016/01/28/teas/proceedings.html > > Etherpad - > http://etherpad.tools.ietf.org:9000/p/notes-interim-2016-teas-1 > Attendees: Vishnu Pava Beeram (chair, leading meeting) Lou Berger (chair) Adrian Farrel Andy Malis Autumn Liu Dan Romascanu Daniel King Dhruv Dhody Matt Hartley Dieter Beller Greg Mirsky Haomian Zheng Huaimo Chen Huang Lu Igor Bryskin Jeffery Zhang John Drake Lin Han Loa Andersson Padma Quintin Zhao Ross Callon Tarek Saad Xufeng Liu Yimin Shen Ravi Torvi Mateusz Waldman George Swallow