Last Modified: 2004-10-14
|Done||Submit draft on calculation of IGP routes over TE tunnels to IESG for publication as Informational RFC|
|Done||Submit initial Internet Draft on IP Fast Reroute Framework|
|Jun 04||Submit initial Internet Draft on Basic IP Fast Reroute mechanism|
|Aug 04||Review various mechanisms for Advanced IP Fast Reroute|
|Oct 04||Submit IP Fast Reroute Framework to IESG for publication as Informational RFC|
|Oct 04||Submit specification on Basic IP Fast Reroute mechanism to IESG for publication as Proposed Standard|
|Nov 04||Select the Advanced IP Fast Reroute mechanism|
|May 05||Submit specification on Advanced IP Fast Reroute mechanism to IESG for publication as Proposed Standard|
|RFC3682||E||The Generalized TTL Security Mechanism (GTSM)|
|RFC3906||I||Calculating IGP Routes Over Traffic Engineering Tunnels|
RTGWG IETF 61
1. Agenda bashing, aministrivia (chairs) [5m] 00:05
2. Document status (chairs) [5m] 00:10
RFC 3906 published (informational)
GTSM -- more comments need to be integrated, last call before Minneapolis. Implementors please inform mailing list/authors
Framework, loopfree, MIB well along.
uloop prevention design team constituted (names already sent to list). Desire to keep membership small (already not small, so maybe "less big"). Goal total coverage if possible, extensible if not. Design team to report back by December '04.
3. Basic IP FRR spec update (Alia)  00:25
Document revved to be more of a spec and less of a survey. Need to read framework too because that's where definitions section is!
To do: Multihomed prefixes, link selection, SRLG.
Need more people to read & comment. Comments to list please.
4. IPFRR MIB (Alia) [20m] 00:45
Only the first of many MIBs.
Doesn't cover SRLGs (yet?)
Includes protected route table (with NH, alternate NH including alt NH type)
Includes unprotected route table (just route and why)
Global routing stats (various kinds of route counts)
Not covered: IGP (IPFRR enabled? local holddown time?), LDP (protected/unprotected FECs, alt NH info including alt label). Other (small) MIBs will probably be needed for these.
Please comment on: is this grouping of MIBs appropriate?
Alex Zinin: re protected/unprotected route tables, why use a different table instead of augmenting an existing table?
Alia: I don't know how to do that, my understanding is you can't really extend a MIB, this one is indexed the same as an IP routing table MIB which I think is as good as it gets.
Alex: so how do I use these tables?
Joel Halpern: please remember that a MIB is a MIB, it's just used for management purposes, it doesn't drive the implementation.
Alex: Are these different sets of routes, will it be recorded twice, once in normal routing table and once in unprotected table?
Bill Fenner: I will sometimes admit to being MIB-literate. This is the right thing to do. Indexes are dup'd but info isn't.
Stewart Bryant: How do we report dynamic info like "repair attempted but failed"? No doubt there will be other dynamic info.
Alia: Q is what level to detect, what level to report at. Probably will be in IGP MIB and not this one.
Stewart: I think this is really important for O&M, because these faults are transient so we must be very attentive to this issue.
Alia: Yep. We need to make sure that we can actually detect the errors we put in the MIB!
Stewart: Need to go to ipfix? Maybe doesn't even need to be in MIB.
Alia: We should talk about it.
Stewart: We'll try to write a draft up about it.
Don Fedyk: We did consider that. Error reason is in there but there's no history associated with it. Take a look at what we have and see what needs to be improved on.
Stewart: An example of what I'm talking about is we think we have a protection path but when we try to send a packet on it, it fails.
Is MIB grouping sensible, are MIBs sensible, please read and comment or you will get what you deserve? Right now draft has u-turn alternates in it, should it include other candidate alternate types?
Stewart: First MIB should include basic, where there is common ground, then have a different MIB for advanced.
Alia: All I mean is that there is type defined for "u-turn" for alternate type, and a row in interface for "can I break u-turns".
Alex: Maybe we should just rename u-turn to "reserved"?
Comments to list please. Very few admit to having read it.
Alex: We'll ask on the list about making draft a WG doc.
David Ward: Who will do IGP MIBs?
Alia: Are you volunteering?
David: No. Someone from this WG should do the work and then present it to the IGP WGs.
5. Micro-loop prevent DT report (Alia, Mike) [20m] 01:05
Discussion [20m] 01:25
Trying to bring order to chaos, we have too many partial solutions right now. Trying to explain, divide solution space into types, consider types, summarize.
Basic problem: Microloops resulting from conventional IGP converge-as-fast-as-you-can loses traffic, undoing IPFRR goodness.
Reason for uloops: Independent/asynchronous decisions. Loops are temporary! Duration can be much longer than IPFRR time though. Duration driven by relative time to update FIBs (i.e., degree of asynchrony). No way to guarantee two routers will take similar length of time to update FIBs (from one router's PoV the network change may cause just a few routes to change -> fast download, from another PoV many routes may change -> slow download).
Solution: Controlled convergence. Inevitably makes convergence slower, but this is OK because IPFRR repair covers failure allowing leisurely convergence. But: still want to keep traditional method as fallback in case of multiple failures.
- Controlled information flow (incremental cost change)
- Controlled distributed behavior (synchronized FIB installation, ordered FIB changes, path locking)
(See slides for full comparison matrix, highlights follow)
- Incremental cost change -- can take hours
- Synchronized FIB install -- seems simple, but isn't, and dependency on NTP
- ordered spf's. no changes in forwarding plane. doesn't deal with SRLG (only single failure is supported). Need to extend algo to a per-destination base. Long delays if large network diameter. Worst case can be pretty long...
- path locking. cons: complete coverage requires additional forwarding mechanisms. pros: small delay in rib/fib installation.
Detailed description of the above four methods
Ordering by signalling
Alex: Is node failure a SRLG case?
Mike, Alia: No. Node failure can be handled by any of these techniques.
Ordering by delay
"Lollipop topology" (for example) can make delay-ordered SPF slower than needed (known techniques are more pessimistic than needed).
Can combine delay and signalling (optimization of delay-based version, point is that signalling doesn't need to be reliable since delay backs it up)
Backwards compatibility is a problem.
Alex: how much is it really a problem? Can't you just announce the capability in your IGP and only start using the method when all routers support it?
Mike: yes but that means if you infect your network with one router that doesn't support this, you've broken the scheme.
Three epochs -- change discovery time, use transitional paths time,
lock to new topology time.
Potential transitional path types -- tunnels, safe neighbors, packet marking, u-turn
Sorting out the possibilities -- what are the criteria? Time to be converged (ballpark: 10 sec), simplicity, SRLG support (or really, unpredicted multiple failure coverage), no additional mechanisms beyond IP (may hurt coverage), common additional mechanisms for this and other advanced methods, also work for LDP.
- Incremental cost change impractical
- Sync'd FIB swap -- skeptical about practicality
- Ordered SPF -- long delay, poor SLRG support -- enough to be an issue?
- Path locking -- seems most promising, many possibilities (ed: but, maybe it's just that the newest toy is always the shiniest?)
- Haven't thought of any new methods this morning but we haven't been to the bar yet
- Need more brain power on this, more discussion
Danny: Is incremental deployability a hard requirement?
Alex: Yeah, and is 100% coverage required?
Danny: Sure but is incremental really a hard requirement?
Alia: Path locking can be done incremental. You can't have a flag day.
Danny: Well not a flag day, but it would be OK to require all routers to have same version of code before solution becomes viable.
Alia: But still need to worry about turning it all on
Andrew Lange: That's what maintencance windows are for.
Voch Kompella: Re sync'd FIB swap -- If requirements were externally provided (and included atomic clocks) problem would be easier. Are we making the problem harder than we have to because we are inventing our own requirements?
George Swallow: Are you only worried about clock skew during failure?
Mike: Skew isn't the problem, problem is skew in FIB install time.
George: So clock sync is not the biggest issue here actually.
Mike: Yes although I'm nervous about inter-layer dependencies.
Stewart: Well if you can detect that NTP isn't working then you can just disable the loopfree thingy.
David Ward: So we've asked for a collection of requirements but have no place to collect them.
Alex: Actually we haven't asked for requirements.
David: How do you multicast?
Mike: General thinking is that you have to get the packet to the other side of the failure, can't just drop it off some place and use the unicast/downstream approaches because of RPF, etc.
Bill: Two halves to problem, other half is you need state to know where downstream neighbors are for mcast. So fast repair has to repair that state as well. You can get the packet to the other end of the failure OR get the join state down the repair path real fast.
Stewart: We're talking about for the repair, right? For the uloop convergence you have lots of time to fix up the mfib?
Everyone: Nope nope.
Bill: You're moving the tree around. PIM needs to get access to the new SPF topology before the new FIB is put into use, that might work.
Alia: At a minimum we have to not break mcast/make it worse! Secondary question is how to protect mcast too.
Alex: So getting back to uloop prevention...
David: Design team requested requirements, how are we going to provide them Alex?
Alex: Oh, thought you were asking about a requirements document
Alex: In particular SPs should try to respond to presenters questions/strawman requirements. SRLGs? Less than full coverage? These are important because they will drive the selection of mechanism.
Danny: Where ARE we going to record the requirements?
Alex: The mailing list?
Alia: The taxonomy doc?
6. Update on draft-atlas-ip-local-protect-uturn (Alia)[20m] 01:45
- Explicitly marked packet identification (well known label?). Makes ID'ing potential U-turn packets easier, etc.
- Example algorithm for how to look for U-turn alternates. (Worst case is 1 additional SPF per neighbor.)
- Simplify alternate selection
- More detailed explanation considering link protection