Meeting Minutes Note takers: John Scudder Sue Hares Christian Martin Warren Kumari Rob Shakir David Freedman Compiled and edited by John Scudder Agenda: Interdomain Routing (IDR) WG WEDNESDAY, July 27, 2011 0900-1130 Morning Session I 206 A ===================================================== CHAIR(s): Susan Hares John Scudder o Administrivia Chairs 10 minutes - Note Well - Scribe - Blue Sheets - Document Status o draft-gredler-bgp-te-01 Slides Stefano Previdi 15 minutes o draft-frs-bgp-operational-message-00 Slides Rob Shakir 20 minutes o draft-raszuk-wide-bgp-communities-02 Slides Jeff Haas/Robert Raszuk 10 minutes o draft-keyur-idr-enhanced-gr-00 Slides Keyur Patel 15 minutes o draft-bashandy-bgp-edge-node-frr-00 draft-bashandy-idr-bgp-repair-label-02 Slides Jakob Heitz 10 minutes o draft-zeng-idr-bgp-mtu-extension-00 Slides Jie Dong 5 minutes o draft-ietf-idr-add-paths-guidelines-01 Slides Adam Simpson 10 minutes o Path Exploration Damping Slides Mattia Rossi 20 minutes Speaker shuffling time 5 minutes Total 2 hours Editor's Notes: o Times indicated below are local time and are approximate. They may be useful for finding the corresponding section in the audio recording, http://www.ietf.org/audio/ietf81/ietf81-206a-20110727-0855-am.mp3 o The notation ~"quotation"~ (a tilde next to a quotation mark) is used to indicate an approximate quotation. Chairs/Introduction ------------------- Start time 9:00 Pending work items: - Clarifications are required. - GR required, and merged error handling document needs to be done. - Route reflection also may need some clarification work. Seemed important that we address individual drafts with GR for error handling - but all go into a merged document at the end (Sue). Combine them into one consistent space - can then see the error handling across the RFCs, and then changes there. Status: o: draft-ietf-idr-deprecate-as-sets: Move from informational -> BCP. o: draft-ietf-idr-bgp-issues -- WGLC done, awaiting minor edits. o: draft-ietf-idr-link-bandwidth -- witing on implementation report. Stefano Previdi: Advertising TE Information in BGP (draft-gredler-bgp-te-01) ---------------------------------------------------------------------------- Start time 9:07 One thing that is required is a change of name of the draft to reduce confusion. Definition of an API for components outside of the routing layer to get the topology information - e.g. for alto. No intention to do things between ___routers___, rather to out-of-band devices. (~"we have no intention, out of scope, to leak routing information between *routing* layers. it is for extracting information from the routing layer"~) Brief overview of alto - collecting topology information, doing some magic, and then presenting a map that gives the overlay topology - or give an application some information about the topology where they require this. Deployment and operation perspective, then putting this into the IGP at the right places (e.g. different ASNs, or different IGP areas/levels), then this is quite difficult. Also perhaps some security concerns with this. No passive mechanisms in IGP in general - therefore BGP is mainly used in the deployments of alto currently. They need more information than is currently available in BGP. Added a new NLRI - for both global table and VPN contexts, therefore include both SAFI. Then add node, and link attributes in the SAFI. Problem is how we encode this - without re-inventing the wheel. Encoding is quite link-centric - using the existing IS-IS encoding format where possible. No changes to be made in BGP operations or state machinery - use the standard BGP path selection and distribution mechanism. BGP path selection will be used where the alto server sees multiple paths. There are some TODOs for this draft - especially some around the intended behaviour so that we don't see the same level of messages as we might see in an IGP. Some question as to whether they want to encode the area ID in the NLRI, or whether they should use community or not. 9:13 Q: Sue: Why not timestamp updates? A: We considered it. Rather than timestamp, why not use IGP sequence numbers? We're not sure we need it at all. We're extracting a topo from one component and putting it into another. It's like redistribution. We rely on BGP's native mechanisms to converge to the latest information. We could later add a timestamp or sequence number TLV. I'm not yet convinced that we really need it. The timescale we're talking about is minutes or tens of minutes. Q: Sue: We can take the rest off line, it's a question of synchronization of data from multiple sources. 9:20 Q: (Name not captured): I don't understand the puppose of the draft. We already have TE types in the IGP, type 10 or type 11. A: I think you're referring to OSPF LSAs. First, this really isn't a TE draft. Q: Basically it seems you can get what you need using IGP. A: Yes, we know the IGP can do that, but there are operational reasons to avoid having the ALTO server participate in the IGP. Putting in the information into BGP helps secure the IGP. 9:22 Q: Acee Lindem: Another reason not to flood all the TE in OSPF is that doing so would be ... surprising ... from scaling PoV. But another thing. Why not just use PCE? A: We did talk with PCE folks. Our use case is different. The PCE collects from the IGP and then serves up fully-computed paths. Q: Are you saying this was just expeditious? A: According to my understanding, there is no mechanism in PCE to extract information from different areas. 9:24 Q: Rob Shakir asked if there could be any analysis of the security requirements in the draft "Often, security gets waved as a big flag". A: alto servers are not permitted in the IGP today Q: Rob restates the point that the security requirements should be specified in a more clear fashion. (~"Since you keep talking about security, you probably should expand your security section to answer these questions and provide your justification."~) 9:26 Q: Sue: It would help if you meet with PCE authors. A: We did Q: Please check back with them. A: OK Rob Shakir: BGP operational message ----------------------------------- Start time 9:27 Comes from request from chairs in Prague to look at diagnostic and advisory drafts and put together a merger framework. Motivations, firstly error handling, secondly, improving operations, In the first case, if we make the complexity of the error handing in bgp greater, then we need some way of our NOC seeing this condition between the two nodes. In the second case, decoupling of session and messaging, used in peering, useful for NOC-to-NOC, at IXP, send static message. Capability signalled, TLV based message, 4 TLV types, Advise, State, Dump, Control, these can be easily extended, advise TLVs come from advisory, state and dump tlvs come from diagnostic, control TLVs deny access and provide rate limiting. Why in band? BGP authenticated, existing control plane channel, especially in IXP case, information propagated relevant to the carrier session. Security and convergence concerns exist, interlaaving a convergence concern, use of control TLVs and ability to completelty ignore a message. Overlap with BMP, two seperate idea, both useful and both combine well, BMP is out of band, seperate socket, no overlap. Intra-domain and inter-domain scenarios are very different. Does not replace BMP at all. Soliciting feedback, FAQ online, next steps, revise to -01 based on issues raised, requesting IDR adoption. John, considers this an extension to advisory, request response is the new work. 9:39 Q: (Name not captured, Huawei): Interesting. Last year I gave a similar draft in OSPF. The focus should be information that we can't get other ways. A: OK 9:39 Q: Jeff Haas: My comments are that the additional complexity seems pretty heavy weight as far as the new stuff goes. That is a big concern. Secondary concern is that many of the things look as though they have disclosure issues. A: Agreed. There is potentially an information leak issue. You might address it by limiting request-response interdomain, i.e. non-reply. Q: Jeff: Another thought from implementation PoV. If we're putting in a query- response mechanism to ask "did you get this thing I sent you" there may be a problem associated with the fact that some implementations throw away received information even for routes stored in adj-rib-in. Q: Jeff asks how this non-reply works with the request/reply form, proposes a lighter mechanism, e.g. send back a sequence number or something like that instead of the full malformed update. Jeff also talks about malformed updates, will it cause local crash when MUP received? , Jeff's feedback is "Strongly reconsider anything which pushes back an entire update". A: Rob suggests that draft should state clearly "don't re-parse errors". Also mentions draft includes list-of-nlri option. Suggests maybe just hex dump any returned malformed updates? Q: Seems like what you really want is a transaction acknowledgement. 9:44 Q: Enke: NOTIFICATION doesn't work for Jeff's case because it brings down the session (Continued debate between Enke and Jeff regarding where malformed update should be logged and whether it should be sent back to sender.) 9:46 Q: Enke: please go back to slide 2. In the case of planned maint, there is already a well-defined CEASE message but it may not be adequate because it doesn't include the expected duration. Maybe CEASE needs a subcode to give the expected duration? A: CEASE doesn't provide enough context to be useful. Q: Right, so augment CEASE with everything you need. A: Problem is we don't know what we want. The flexibility provided in OPERATIONAL is a benefit -- the TLV is just a string. 9:48 Q: Jeff: I think Enke's idea is great. You could put the exact format you're proposing as a subcode. Put a string in the CEASE if you want. A: Another use case is quiescing a connection but not actually downing it. It still becomes unusable but no CEASE. Q: Warren: yes I agree completely that is a valid use case. Q: John: Following up on security/disclosure... A: ... we have the option not to respond at all. Q: My question for operators in the room is, would it be more common to turn off all possible disclosure features on EBGP, or to leave it on? A: Dave Freedman: There are recommendations in the draft about this (default to off). A: Rob: In Internet deployments, off, but in VPN EBGP might be on. Warren: Also could be used by organizations that have multiple ASes who use EBGP internally. 9:53 John: Poll for those who've read. ("A fair number".) Having done that, how many are interested in adding the request/response functionality? (Speaking as author of Advisory likes structured message part better.) ("I see one half hand") Robert Raszuk: Request/response already today with one-time ORF for example. So Not fundamentally new. Randy Bush: Agree generic TLV is better although Jabber/Skype peanut gallery is saying "just what we needed! IRC and Twitter for BGP!" Rob: Problem with request/response is IF you need it, better to have it in base framework. Unless we add a mechanism for it later. John: My sense of the room is comfort level with message structuring is good, request/response, discomfort. Move that conversation to list. Jeff Haas: Wide communities --------------------------- Start time 9:55 (no q's) Keyur Patel: Accelerated Graceful Restart ----------------------------------------- Start time 10:01 10:09 Q: Jeff Tantsura: seems kind of complex compared to NSR A: NSR doesn't protect you from unwanted session resets. Question is, can you get away with incremental updates 10:10 Q: Jeff Haas: Rob, this is the acknowledgement I was talking about. Could be leveraged for operational message ack. Q: Rob Shakir: But in this proposal not each update has a specific ID, right? Q: Warren Kumari: You could do something like an offset from a sequence number. Pretty funky though. A: It's an implementation issue how often you increment your version. Q: Jeff: Will you discuss why you chose it as a separate message instead of some other choice such as updating the marker. A: It's not just updates you have to checkpoint, ORFs for example. Doing it as a separate message decouples it from any specific message. A: Enke: here we are really talking about the routing state, the superset of routing updates. 10:12 Q: Sriram, NIST: This will be very useful in the context of BGPSEC since updates are going to get more expensive. Q: Sriram: You say you may need to do a full exchange in case your table is corrupt. Please elaborate. A: I have to do it any time I don't preserve my Adj-RIB-In or -Out. For example if my policy changes such that I might have accepted some prefixes that were dropped by the previous policy. A: Enke: One more scenario, specific to NSR, it's implementation specific. Can simplify existing implemenatations. 10:14 Q: (Name not given) seems very complicated, cpu time, memory to do enhanced GR. Seems like spending a lot during runtime to save work at restart time. A: Enke: Please be more specific about what seems complex. Q: Versions have to be generated and saved. Uses cpu and memory. A: Enke: If you are already doing BGP today, you are already keep tracking of incremental data. All you do know is add a number to each update. A: Keyur: describes the details more for session restart Sue: Let's move to list. Question for operators, how often to GRs occur in real life? 10:18 Q: Rob Shakir: this can happen often, and in a short space of time. Therefore I think this is a really good idea. Q: Acee Lindem: This is more robust than NSR since not so many things need to be done perfectly. I think this is good. Keyur: requesting WG adoption John: How many read draft? (hands showed more than half the room). How many want to move it forward to WG? (roughly the same number that read it). Of course, take to the list. Loop Free BGP with Repair Label: Jakob Heitz -------------------------------------------- Start time 10:22 Q: Rob Shakir: do you have any data showing how often this actually happens? Where you have one CE dual-attached to two PEs? Basically the CE is the cheapest piece so this scenario just doesn't exist in my network. So is this really an existing problem? I don't have it in my network. A: You don't have dual-homed CEs? Q: Rob: No because CEs are cheap and tail circuits are expensive. I would have two CEs. Q: Keyur: it's not a common case but severe Q: Rob: but for how long A: Jacob: In short I don't have any data for how common it is. 10:27 Q: (Name not captured, Huawei): current implementations can solve this issue. (essentially directed forwarding with 3107, long description) A: Yes that works in an active/standby case. But in an active-active case both PEs must do an IP lookup. Q: label allocation behavior should always be consistent. Use IPFRR to cover the active-active case. A: How is that different from what you said the last time? (further debate of this point) John: please take it off-line John: This kind of work will likely get moved into RTGWG John: This seems like one particular use case, we should understand whether or not this is a sufficiently general solution Jie Dong: MTU Extended Community for BGP ---------------------------------------- Start time 10:34 10:39 Q: Jeff: comments -- from IDR pov this is somewhat interesting and I've heard it spoken of before. Is there an assumption this will be fed back into LDP after the inter-AS hop? A: It can be but not currently part of the proposal. Q: I would suggest adding that. Q: I am co-author of a draft to use BFD for PMTUD. This is not the only solution. 10:41 Q: Jeff Tantsura: I think you should clearly differentiate between 2 and 3 label when you run labeled IBGP vs. distribute LDP into BGP. Quite different, different procedures for passing MTU. John: How many read draft? ... a few. Comments/feedback to authors/list. Adam Simpson: Add Path Applicability ------------------------------------ Start time 10:42 No questions/comments Mattia Rossi: Path Exploration Damping -------------------------------------- Start time 10:57 11:17 Q: Keyur: Good work. One comment on slide 36 -- knowing a few different implementations ... A: ... AS 12 is announcing a longer path to AS 11 Q: ... right, and you want to announce that quicker to AS 11 but delay it to AS 13 ... A: exactly Q: ... then if they are in the same update-group... it would be interesting as to how you stop update on one side but not the other. For this reason, might want to consider this more as an inbound rather than outbound processing thing. Send it, but let the other guy process it a little later. We can continue off line. 11:19 Q: Jeff Haas: for path length comparison are you using absolute length or number of unique AS? A: absolute Q: suggest rerunning with unique-as path length, i.e. factor out prepending Q: second comment, the further you move from the origin, the more path-hunting becomes a problem. as a heuristic can you decrease your timers as a function of distance from origin? maybe low-exponential with distance A: great idea, thanks 11:21 Q: Jeff Haas: this works best when AS path length is the main determinant. but in many networks other policy is at work. A: Yes we know. We've observed the data some and do see the longer path updates are common. Needs more investigation. Q: skh: need to investigate larger data set. A: Yes, please audience share your data sets. Mail to author. 11:23 Q: Randy Bush: most operators don't know where their tie-break is. Experimental code in Cisco and Juniper to answer that question. I did a something-NOG presentation on this about a year ago. I would *love* to deploy this code in a richly-connected router. (Contact randy.) Q: Randy: also I am not RECOMMENDING flap damping. We recommend changes to make RFD less harmful if it's used. I think your approach is quite interesting. I'm less confident than Jeff about how much state you have to keep. 11:25 Q: Jeff Haas: MRAI isn't as common or as strictly to the RFC implemented as you may think. A: Yes, I know it may differ from implementation to implementation. In my experiments I used Quagga as-is. Q: Randy: also many people have turned it off or down Q: Jeff: We have it off by default. Some customers turn it on. But as it impacts convergence, sometimes it is not turned on. Convergence trumps control plane load as long as control plane can keep up at all. Tragedy of the commons. A: Planning more core network experiments 11:28 Q: Rob Shakir: About wanting more MRT data, are you aware of RIPE RIS project? A: No, please send pointer.