Interdomain Routing (IDR) WG THURSDAY, November 11, 2010 1300-1500 Afternoon Session I Garden Ballroom 3 ===================================================== CHAIR(s): Susan Hares John Scudder o Administrivia 10 minutes Chairs Note takers: Jake Khuon, Shane Amante, Roberto Fragassi (roberto.fragassi@alcatel-lucent.com) Merged notes: Sue Hares Document Status: - 3 New WG Docs: wkumari-deprecate-as-sets idr-fsm-subcodes add-paths - WG LC, revs needed: idr-link-bandwidth idr-rfc3893bis - WG LC completed, needing action from chairs: bestpath-selection-criteria mrai-dep bgp-identifier ... co-chairs to try to get these done before Prague Ready for WG LC: - bgp-issues Close to ready for WG LC (?): - idr-bgp-mibv2 - idr-as4octet-extcomm-generic-subtype - best-external Ask implementers to let co-chairs know if they have implemented: - bgp-identifer - idr-dynamic-cap As a reminder: - To progress draft to RFC, two interoperable implementations: + stakes are high, bar should reflect that - Combine related docs: + fewer, more comprehensive, higher-quality docs + individual drafts are OK, over time overall themes emerge, but combine to progress them o draft-chen-ebgp-error-handling-00 15 minutes Revised Error Handling for BGP Updates from External Neighbors Enke Chen Problem: - RFC4271 says that malformed attribute resets the sessions - *Relaxed* for optional transitive attributes - Other cases not covered, in particular for well-known attr. such as: AS_PATH, length of opt. transitive attr. - Have seen AS_PATH issue a few times in the past, lately we've seen the 2nd case that causes significant problems in the field - Session reset disruptive Goals: - Minimize scope of networks that are being impacted by malformed updates - Limit impact to only the prefixes involved in malformed updates: IOW, preserve good, punish bad - Need to maintain protocol correctness by avoiding routing & fwd'ing loops Proposal: - Partially revise error handling in case of malformed updates from eBGP sessions - DO NOT reset eBGP session if at all possible Assumptions are the following: - As long as prefixes can be correctly parsed in malformed update, then treat update as withdraw - Consider this "safe", because policies already applied at eBGP boundaries. IOW, "treat as withdraw" is just like applying a filter, which could already drop routes that aren't permitted by policy. - Previously, session reset would get operators to notice problem has occurred. This proposal, treat-as-withdraw, could mask the problem, so it's critical to provide a logging facility in order that operators are alerted that problem has occurred that they need to fix. - When dealing with malformed updates, how much can you trust data being exchanged? Recommend very strongly following "sanity checks": + NLRI field must be parsed completely & correctly + every message needs to validate BGP message header (sync. header, message type & length) + Assuming these checks are OK, then there is low probability of other data on this session being corrupt/malformed Parsing of the NLRI Fields: - Must put the prefixes at beginning of the message + MP_REACH/MP_UNREACH are first attribute(s) in UPDATE + Use both MP_REACH/MP_UNREACH for IPv4 unicast, as well What about iBGP? - "Treat-as-withdraw" unsafe for iBGP, in general, because if one router drops UPDATE and another still has it, result could be forwarding loops + If you have full-mesh of tunnels it would be OK, but in general can't do treat-as-withdraw for iBGP - Error handling (session reset) remains unchanged for iBGP - If iBGP session flaps, identify "bad" prefixes & apply filters at ingress ASBR to prevent further flapping - iBGP would require too many protocol changes, so not proposing to modify iBGP rules at this time - Also, believe issues with malformed updates are typically with eBGP John Scudder: If we go forward with this it's a big change to the protocol & correctness assumptions of the protocol, so encourage WG members to look carefully at the draft and think about it. Questions? Richard (renewi) Li from Huawei: Believe there could be routing loops as AS level with this proposal, because if you 'treat-as-withdraw' bad prefixes from external peer,d but accept the same prefixes from other AS'es it could result in routing loops. Enke: Don't believe this is an issue. Every eBGP update is valid. Fundamental to BGP is that you can apply policies on eBGP sessions that would deny learning prefixes from one AS and accept them from another AS for TE reasons and it doesn't cause forwarding loop. Fundamental to BGP is every eBGP update is treated as valid and you can choose to use whatever eBGP routes you receive as best-path. Richard LI: Although every eBGP update is treated as valid, but it may not be selected as best-path. If it's not selected as best-path ... maybe inconsistent. Enke: If you look at treat-as-withdraw, we're not talking about just discarding the UPDATE if it's malformed. With treat-as-withdraw, we're proposing that ASBR that detects malformed update must advertise the prefix as if it were a withdraw to internal BGP peers. Richard Li: Issue is with best-path selection, b/c you have withdrawn this route from this AS, but other AS doesn't think this route has been withdrawn and they may continue to forward along that path. Enke: This isn't an issue, because treat-as-withdraw is same as reseting the whole session. With latter, you're withdrawing all routes compared to former where you're withdrawing a small number that were malformed. Tony Li: When you withdraw a prefix it propagates. Once its treated as withdraw by 1 system, everyone downstream sees it as a withdraw from that AS.] Cengiz: If you withdraw prefix from 1 AS, but accept it from other AS'es ... Tony Li: That's fine. It's going to go around AS that found the error. Richard Li: Other AS'es may not withdraw the same prefixes from that AS ... Tony Li: Still OK. It looks just like BGP policy, in terms of dropping a prefix as a result of policy. Enke: Treat-as-withdraw is just like applying policy to filter out prefixes. cengiz: bugs can go unoticed ___:: it will be noticed... covered in another presentation cengiz: when you are further downstream, you may not notice bugs because of this ___:: we don't like to see errors break everything Richard Li: Maybe need to show you picture. Enke: Yes, please. Richard Li: Is there a plan to update the MIB to help debug this problem, to know the session is not down, but not fully up? Enke: Haven't thought about it, because MIB's developed later in process. John Scudder: Fair comment & in slides about needing good debugging tools for this, so we'll need it ... Cengiz: In general good not to reset session, but if malformed updates happen small % of time, I'm afraid the further away you get from where the problem is occurring, then problem could go unnoticed and not get fixed. Robert Raszuk: Going to present solution in a minute to help "notice" the problem. Enke: Good point. We assume good routes will dominate and bad routes will be infrequent and we're trying to take bad routes away from causing a problem. Cengiz: It's not good or bad, concern is that bugs will go unnoticed vs. session reset wh/ will get noticed. Enke: Removing bad routes has same effect as session reset, except it has benefit of preserving good ones. Cengiz: But session reset has advantage of fixing the bug real quickly. Rudiger Volk: We don't like to see errors break the whole network and force updates everywhere, so this is right thing to do. John Scudder (to Enke): What would you like wrt next steps for the draft? Enke: Would like to combine this proposal with current error-handling draft for optional transitive attribute. That doc is currently a WG doc, but this one is not, so idea has to be adopted to combine the two drafts. John Scudder: Room shows strong show of support to adopt this idea. No one objects. John Scudder: No GROW chairs in the room, but good idea to present to them as well. John Scudder: Will take this to the list. Sue Hares (to Enke): Would like to see this in 2 stages: - validate the idea & problem - merge with error-handling ... otherwise, difficult to untangle the two. Chairs: Please take this to the list. o draft-keyur-bgp-enhanced-route-refresh-00 10 minutes Enhanced Route Refresh Capability for BGP-4 Keyur Patel Motivation: - Current Route Refresh doesn't provide any kind of table demarc - Table demarcs useful for consistency checks of missing w/draws & table measurements Enhanced Route Refresh Capability - Define 2 new sub-types: Start of RIB & End of RIB Route Refresh message - When you receive RR SoR, you announce whole RIB, when done send RR EoR - RR EoR marker could be used by receiver to purge all 'stale' routes or to know xfer time of table transfer Extreme churn & RR EoR: - Expect RR EoR is generated once the table announcement is complete - RR EoR could be delayed for hours if lots of churn going on. Result is delay in purging stale routes, but doesn't have any downsides. - Potential solutions: + delay sending RR EoR, but delays purging of stale routes + implementation specific timer to generate RR EoR after specified time interval, similar to stale path timer in graceful restart + throttle inbound update processing until peer sends you RR EoR when its done sending to whole table + these mechanisms are in draft, only suggestions & not mandatory. Questions: Richard Li (Huawei): Race condition if I send 2 start messages and 1 end, is end associated with 1st start or 2nd start? Dsiconnections from refresh start and end. Keyur Patel: RR start & end are not connected & behaviors are completely different. Doesn't need to be connected Richard Li: If you recv EoR and still have stale or withdrawn routes that aren't refreshed, could keep waiting for those routes to get refreshed forever. Enke Chen: When you see 2nd start, you cancel the first start and start over again. (one start will cancel first). Richard Li: 2nd RR SoR override 1st SoR? Enke: Yes. Richard Li: Think this isn't in the spec. John Scudder: Recommend spec be clarified when you have multiple RR SoR messages. Keyur: Sure. John Scudder (to Keyur): What do you want to do with this draft? Keyur: Like to see it as a WG doc. John Scudder: Would like to take this on as WG draft: a lot of hands. No objections. Will take it to the list. o draft-keyur-af-specific-rt-constrain-00 10 minutes AFI Specific Route Target Distribution Keyur Patel Motivation: - Current RT constrain mechanism filters all VPN AFI/SAFI's + Suboptimal when ops want to use AFI-specific RT's - Current RT constraint mechanism only supports pfx'es with max-length equal to 12-Bytes + Need 24-Bytes for IPv6 RT's in RFC 5701 - Need to be backward compatible w/ RFC 4684 AF Specific RT Constrain Cap.: - 2 new AF-specific RT Constraint Cap: + IPv6 RT Constraint Cap. + L2VPN RT Constraint Cap. - RT pfx'es with these RT constraint AFI/SAFI are used to filter just that addr. family - VPN AFI/SAFI that don't use these RT constraints use default RT constraint AFI/SAFI Ext. Pfx RT Constraint: - Allow pfx length up to 24B - Sep. capability for each RT Constraint VPN AFI - Fixed length pfx'es of 4, 12 & 24B Questions? ____ from China Mobile: Function is very useful for IPv6, would like to recommend to move draft forward. Enke Chen: All policies defined so far are all AF-specific. AF filtering = AFI + SAFI. If SAFI is fixed, what do you do wrt filtering on all other combos of AFI + SAFI? other combinations of AFI/SAFI Keyur: 2 things - 1) Wanted to make minimal changes to RFC 4684 - only cover AF's that are need their own RT Constraint. 2) All the other VPN address families, e.g.: multicast ones, they follow the unicast path, so having them use base RFC is OK. ... that's why we didn't go the SAFI specific way. But, if WG says we need SAFI's we're OK with inserting them. Rahul Aggarwal: Decouple the address families. Multicastaddress families follow unicast families, this may not be true. ?Skeptical on this behavior, need to go down AFI/SAFI basis. Rahual Aggarwal: Possible to allow IPv6-specific RT's on existing RT constraint machinery? Keyur: Yes. Rahul: In that case I'd recommend to decouple them. Filtering on IPv6 RT's is useful, but orthogonal to filtering on AFI+SAFI. Keyur: Agree. Rahul: multicast AF's may not follow unicast AF's. Skeptical about problem of needing AFI or AFI+SAFI RT Constraint; but, if you do this then RT constraint on just AFI is pretty useless. Going to need AFI+SAFI. Keyur: Doing it on AFI+SAFI is OK. Robert Raszuk: Question for Rahul. Asked is it possible to carry IPv6 RT in current RFC? Rahul: No. Asked if procedures Keyur talked out, once you extend it for IPv6, I'd like to see it be decoupled from AFI+SAFI extension. John Scudder: Request to make this a WG doc? 1) Should we work on support for v6-specific RT's? Yes: Mild interest; No: ??? 2) Should we work on AFI/SAFI binding of RT's? Yes (binding to specific AFI/SAFI): few hands; No: few hands. John ... not overwhelming support to move forward with either one. Clearer support for the v6-specific one, but will take it to the list. Jacob Heitz: Not convinced that AFI/SAFI-specific is solving a problem. John Scudder: We did have an operator step forward this time to support this work. In the past, haven't heard from any carriers that needed AFI/SAFI RT's. Most implementers have said they don't see the use case and this hasn't gone anywhere. Encourage other operators to step forward and state that this is a problem to move this discussion forward. The following 3 drafts were presented as a single group, with questions taken after all 3 were presented. o draft-raszuk-bgp-optimal-route-reflection-00 10 minutes BGP Optimal Route Reflection Robert Raszuk Problem statement: - RR's compare IGP metric of NH best-path selection based on where RR is in the topology. RR's not in forwarding path could select NH that doesn't have shortest exit from AS. - Difficult to achieve hot-potato routing - If RR's are in data-plane or on topological path, then they won't have this problem. Not broke, so don't need this fix. However, RR's are typically not in fwd'ing path, so need fix to this problem. Solution: - Calculate customized BGP best path for a given client or group of clients based on IGP metric of RR client PoV. - No change to BGP best path algorithm. - What might metric be? Could be IGP distance, BW, delay or something else that client(s) use. Up to operator on what 'metric' to use. - LOCAL_PREF & MED are still honored before IGP metric to NH - Applicable to best-path, 2nd-best, add-paths, etc. - How do we find the right metric value? Option A: Link State IGP's - Every node has complete view of entire domain - RR can pretend to be the RR-client to calculate the best-path from PoV of that RR-client (or, set of clients) - In hierarchical IGP case, centralized RR's can only see as far as ABR's, but still helps get you closer to client's ABR's Option B: Next Hop Information Base + NH SAFI - RR's ask RR-clients for metrics of next-hops the clients are using - Possibily use new SAFI for RR-client to communicate this to RR - Only need RR-client to communicate IGP NH's, not customer NH's Option C: Angular position approximation - Map RR-clients to position on virtual circle & assign angular position to each client. Calculate delta between client & NH during BP selection. Flexible logical RR impelementation/enhancement - Option A allows to temporarily change RR placement in network during, e.g.: network maintenance - Could be done for whole RR or set of RR-clients of 1 RR Conclusion: - Enables operators to manage their best path selection policy with the AS - Easier migration for operators to Internet free core o draft-raszuk-bgp-diagnostic-message-00 10 minutes BGP Diagnostic Message Robert Raszuk Goals: - Enhance current troubleshooting tools, esp. on eBGP boundaries - Detect routing inconsistencies - Enable new way of error signaling when transmitter sends you a malformed update. - Enable visibility into installed RT filters Encoding: - New BGP Message Type: BGP Diagnositic Message - 128 octets bigger than any other BGP message to encode entire update Diagnostic Message TLV's: - Operational TLV's: how to handle the exchange of requests, either manually or periodically - BGP Database Counter Exchange: # of paths, prefixes, etc. xmit'd/rcv'd - AFI/SAFI signaling when rcv'd malformed update to signal ignoring of certain AFI/SAFI's that experienced error - Pfx specific BGP debugging - BGP Decision Monitoring: # of IGP metric or BGP best path tie breaks - Monitoring of installed RT filters Conclusions: - Tool to simplify troubleshooting - Very lightweight, because its mostly based on counters exchange - Indicates to peers when malformed attribute is detected o draft-raszuk-wide-bgp-communities-01 10 minutes Wide BGP Communities Attribute Robert Raszuk Introduction: - Previous IETF recommended to split draft into two & work on them independently: + 1st for encoding + 2nd for 'registered' communities - Other authors are also on flexible comms draft, so this draft will have to solve for its use cases as well. Goals: - Simplify amount of required BGP policies in PE's - Today can only mark routes; cannot say a route with this set of values should be considered in policy, or not. - Provide ability so ops can use own definition of communities Encoding: - New BGP attribute that is TLV-based - Not intended to replace standard or extended BGP communities - Added TTL field to this version of draft - R flag: registered or local use. C flag used for same thing across confed boundaries. Use cases: - Encoding for prepend 4 times when adv. to two ASN'es Wide BGP Communities: - Not intended to replace std or ext communities - Opt. & Transitive Attr. - Two container templates defined for fixed & variable length Sister draft: draft-raszuk-registered-wide-communities - Documents 'standard' registered wide BGP communities - Encapsulation draft to progress as standard track - Registered types to progress as Information Track Questions? - Jakob: ORR draft: next-hop DB option could cause more traffic instead of add-paths, b/c of extra traffic to propagate NH's. Also, when router starts up will have double the convergence time, because RR needs to learn all the NH's from restarting rtr before sending any updates of your routes. - Robert Raszuk: re: add-paths, pushing 15 paths is more work than NH's. NH's are always reachable in IGP so don't have to wait. - Adam Simpson: Would like to see more analysis for Option A & B in ORR, because concerned about convergence time during churn. - Robert: Option B isn't in draft, but separate doc. May merge it into draft, but should address concerns first to see if it's worth it. - Adam Simpson: surprised for sending diagnostic message, 1 is based on request, other is automatic. Expecting to see sending based on trigger/event, e.g.: in case of detecting error. - Robert Raszuk: request is for synchronization of DB's, but for error conditions it would be triggered. - Matt Meyer: first draft, want things to converge quickly at the edge rtrs. Why not just have a full-mesh within the POP and avoid this issue altogether? - Robert Rasuzk: Full-mesh inside POP is OK; however, it's not optimal for exiting outside the POP. - Matt: What encapsulation are you assuming? - Robert: Any type of encapsulation: LDP, TE, softwire, etc. - Rudegir: re: wide communities. Disagree that current policies magically become more simple as a result of this draft, because policies are not just if/then statements. Policies also define what actions I allow a certain partner to happen & that's not going away. For diagnositic message: don't like idea of inflating functionality & data flow within BGP protocol, particularly because it could delay propagation of reachability information. Would suggest looking at other proposals/protocols to export this information outside BGP. - John: are you thinking of MIB's? - Rudegir: SNMP is not good to get this info out of BGP. BMP is probably better -- recommend extending it, if it doesn't have this info yet. - Albert: re: ORR -- add-paths is prob. right solution. 2nd-draft: if you go down path of remote computation is complex, because multiple areas mask info and don't get any benefits. - Robert: if the networks are a flat IGP, then this would be OK. - Albert: that's not always the case. - Albert: re: diagnostic draft, there isn't a separate RIB-In & RIB-Out in some implementations. - Robert: agreed that we need both to figure things out. o draft-jasinska-ix-bgp-route-server-01 10 minutes Internet Exchange Route Server Elisa Jasinska Introduction: - Want get feedback from IDR on details in this doc IXP Interconnection: - IXP's are Layer-2 platform for ISP's to interconnect in one place. - AMS-IX has 400 members in it & if everyone wanted to interconnect then that would result in 80K separate eBGP sesions. Motivation: - 3 Open Source route server implementations, but no reference or documentation. Result is subtle variations between the implementations. - -00 presented in GROW at IETF 78. Proposed Doc: - Tech. considerations + attribute transparency: has implications on BGP protocol + per-client filtering - Op considerations: scaling, NLRI leaks, redundancy, etc. - Seeking feedback from IDR re: slight changes required from BGP in this doc - Looking for WG doc in GROW Questions: - John Scudder: Sue & John to talk to GROW chairs to figure out wh/ WG should take this document. Question: not clear if you want Std Track doc? - Elisa: Standards Tack as of -01. - John: Normative stuff belongs in IDR; Informative can go to GROW? Also, need to talk to IESG to take on this work, because Normative stuff in Section 4 isn't currently in IDR's charter. Motivation & operational in GROW. - John: show of hands who have read it & think IDR should take it on? Several people raised hands. - Elisa: recommend splitting up doc for IDR & GROW? - Sue: Yes. - Jake Khuon: Doc hasn't addressed partial mesh peering. Solved in past with filters in RPSL. - Elisa: There is entire section on filtering in draft. - Jake Khuon: No discussion on coarse-grained peering mesh. Observing communities might be an option? - Elisa: The entire point of a route-server is to put a policy into a route-server in order to ensure that there isn't full-mesh between all participants. Not sure if I understand your point? - Jake Khuon: Next question: Operators need an ability to perform diagnostics on routes they send & receive to the route-server. Possibly look at extending SNMP MIB's or BMP for this information. o draft-decraene-idr-reserved-extended-communities-00 8 minutes Reserved BGP extended communities Bruno Decraene - Get reserved, non-transitive community for Graceful BGP Shutdown. - Issue: No IANA registry to get a non-transitive community - IANA registry only has registry to allocate well-known communities Proposal: - 1 BGP Ext. Comm: extended, non-trans. - Create new registry @ IANA to allocate these communities from. - Optionally propose to do this for trans. comms as well. Summary: - No proto ext in this draft, but 2 potential IANA actions Questions? - Enke Chen: BGP graceful shutdown wants to use community to signal to peer to change their local-pref on routes. Shouldn't be a need for a reserved BGP community. Just need to limit scope of comm. to local rtrs. - Bruno: need to ensure comm. is removed. - Enke: peer that recv'd it should remove it. - Sue: draft requested a IANA registry. You not happy with that? - Enke: good to have a registry; however, intent of using a community for BGP gshut is not good. - John Scudder: This is tangent. Question: Anyone against creating an IANA Registry? 1 person raised hand. - Enke: BGP graceful shutdown can be solved without registry. - Bruno: What if eBGP peer doesn't recognize the comm and doesn't remove it. In that case, then that community will be propagate thru entire Internet, wh/ is not intended. - John: Since we didn't have unanimous consent to move forward, take it to the mailing list to discuss. - Bruno: Please comment on GROW list. o draft-zeng-one-time-prefix-orf-00 10 minutes One-time Address-Prefix Based Outbound Route Filter for BGP-4 Jie Dong Introduction: - During network maintenance, operators may need to retrieve specific routes from peers & concerned about overhead of whole RIB-out advertisement. 1-time address-specific ORF: - Propose new 1-time addr-spec. pfx ORF for 1-time selective refresh Further Ext: - This ORF used mostly used for troubleshooting & recovery of specific routes. Other types of ORF could also be defined for 1-time AS_PATH, etc. Questions? Rudegir Volk: Doubt that this is useful, particularly as troubleshooting tool. If there are doubts about consistency of routes between peers, then it's likely that we don't know which routes are the problem. For troubleshooting the exact attributes wh/ were on announcement, triggering a peer to resend the announcement so you can see what was the problem is not optimal. Better if implementation logged problem. Enke: share Rudegir's concern that won't be able to identify the specific route that is causing the problem. Enhanced refresh is the way to go because it enforces consistency between peers. What's being proposed here is a subset of enhanced route-refresh. Jacob Heitz: Would like to see this combined with enhanced route-refresh. Other comment: have this & route-target as filter to get a 1-time refresh for a particular RT. Would like to see a refresh for a route-target. - Jie: Agree. draft-uttaro-idr-add-paths-guidelines-03??? -- Adam Simpson - John Scudder ? will adopt. o draft-uttaro-idr-add-paths-guidelines-03 10 minutes Best Practices for Advertisement of Multiple Paths in BGP Adam Simpson Draft Status: - Initial versions were Informational; proposing to Stds Track w/ -03 - Ops need at least 1 mandatory path select. algo that is common on implementations. Mandatory Path Selection Mode: Advertise N paths, N should be configurable, default = 2. Why? Simple, can satisfy any reqm't if N is config'ed properly, (fast failover to optimal backup path, load-balancing, MED oscillation, etc.) Optional Path Selection Modes: still mentioned in draft - All paths useful for path monitoring - AS-wide best paths & AS-wide 2nd best paths ensures avail of backup paths Historical Path Selection Modes: Have been moved to an Appendix Other Changes: include Deployment Considerations re: upper bound on # of adv. paths, benefits of encaps to improve consistency Next Steps: - Interaction between Best-External & add-paths in terms of wh/ takes precedence. Probably add-paths takes precedence. Seeking input on mailing list. - Add-Paths for eBGP sessions? Not sure there's a need. If so, do so in separate draft. - WG adoption? Questions? - John Scudder: Call for adoption finished yesterday and there was good support, so please submit it with the appropriate title.