Meeting Minutes Note takers: Bhavani Parise Rob Shakir Compiled and edited by John Scudder Agenda: Interdomain Routing (IDR) WG THURSDAY, March 31, 2011 1300-1500 Afternoon Session I Congress Hall I ===================================================== CHAIR(s): Susan Hares John Scudder o Administrivia 5 minutes Chairs - Note Well - Scribe - Blue Sheets - Document Status o draft-shakir-idr-ops-reqs-for-bgp-error-handling-01 Operational Requirements for Enhanced Error Handling Behaviour in BGP-4 Rob Shakir 10 minutes o draft-gredler-bgp-te-00 Advertising Traffic Engineering Information in BGP Hannes Gredler 10 minutes o draft-l3vpn-legacy-rtc-00 Automatic Route Target Filtering for legacy PEs Alton Lo 5 minutes o draft-keyupate-idr-bgp-gr-extension-00 Graceful Restart Extensions for BGP o draft-ymbk-bgp-extended-messages-01 Extended Message support for BGP o draft-ymbk-rfd-usable-00 Making Route Flap Damping Usable Keyur Patel 15 minutes o draft-zeng-idr-one-time-prefix-orf-00 One-time Address-Prefix Based Outbound Route Filter for BGP-4 o draft-dong-idr-one-time-ext-community-orf-00 One-time Extended Community Based Outbound Route Filter for BGP-4 o draft-keyur-bgp-af-specific-rt-constrain-01 IPv6 AF Extensions for Route Target Distribution Jie Dong 20 minutes o draft-raszuk-idr-flow-spec-v6-01 Dissemination of Flow Specification Rules for IPv6 o draft-raszuk-bgp-diagnostic-message-02 (update) BGP Diagnostic Message o draft-raszuk-bgp-optimal-route-reflection-01 (update) BGP Optimal Route Reflection (BGP-ORR) o draft-raszuk-wide-bgp-communities-01 (update) Wide BGP Communities Attribute Combined slide set Robert Raszuk 20 minutes o draft-varlashkin-bgp-nh-cost-00 Carrying next-hop cost information in BGP Ilya Varlashkin 10 minutes o draft-retana-bgp-custom-decision-01 BGP Custom Decision Process Alvaro Retana 10 minutes o draft-bashandy-idr-bgp-repair-label-00 Scalable, Loop-Free BGP FRR using Repair Label Ahmed Bashandy 10 minutes Speaker shuffling time 5 minutes Total 2 hours Introductory Remarks Note well - everything is public here. Please remember this. Working Group Status: There is some lack of progress in a few documents - a number of WGLC/adopt request have gone to the mailing lists. In addition to 4893bis being updated, there is also a request for a code point from Bruno. Couple of drafts that are going towards the ISG. In an effort to reduce the dropping of drafts, Weesan Lee has agreed to be the IDR WG secretary, who will help with admin workload. Rob Shakir: BGP-4 Error Handling -------------------------------- Notifications affect multiple AFIs and NLRIs - Problem Statement John Scudder - as to which WG is suitable - John suggested GROW might be the more suitable one IDR accepted error handling draft Reudger said 'how are we dealing with, is all relevant information available and are we taking everything into account for reporting? and are we increasing the complexity' Rob replied that 'if we change error handling it will get complicated and even if we use multiple sessions we will still increase operational complexity' Advertising TE Information in BGP (draft-gredler-bgp-te-00) ----------------------------------------------------------- Certain applications need a complete view of the topology, link-state view of the topology to do route computation and selection. Application of this of the 'alto' server. Complete topological data is interesting for these applications, to guide media requests. GRE and IGP topology hacks are one alternative, alto servers are to be scaled to multiple thousands, and this means that we introduce errors, and can have this issue. Some IGP have views of the IGP - can we extend these? Other applications include MPLS-TE, and spanning multiple domain and IGP boundaries, we need PCE TED synchronisation protocol. Want to convey attributes of the links and paths, of a single administrative domain - do not want to disseminate link-state information outside of the admin domain - just allow the receiver to do link-state. BGP as an opaque carrier for transporting TE information Carry this information in MP_REACH/MP_UNREACH - single link with the anchor router-ID is contained within the NLRI. Three classes of TLV, node anchor, link descriptor, and link attribute. First two are key of the database, and link-attributes are the key data. Describe the nodes where a link connects - these are anchor TLVs. Link descriptor TLVs are the local/remote IPs, these are the IGP metric and other issues. Proposing that we have two SAFIs, SAFI = 1: Default topology, SAFI = 128 which is for VPN applications - Carrier-supporting-Carriers deployed - complaint is that there is no manner by which to see the link data to signal FRR paths, and this would be required. Local and remote ASN is also included to disambiguate IP addresses where they are ambiguous across domains (such as those defined from private IP addressing.). Have already had healthy feedback, soon to be published -01 version - key of the NLRI and then describe the path selection. Also, one may expose internal service provider information, and hence there is a security issue - which the draft should be extended. Ilya Valrashkin: Do you plan to rely on the 'alto' server to use this information, or would this be done on the router? A: The plan is that this would be used for alto, and then for inter-area TE-LSP information. Ilya: Propose to flood this information across the ASN, and then flood this across the ASN? Is there provision for this to be restricted - sometimes there is a requirement for stitching? A: There is path aggregation in the draft, so this can be used to restrict the information or topology that is to be exposed. Ahmed Bashandy (Cisco): This information is going to be transitive across AS boundaries? Are you going to make BGP a link-state protocol? A: The only information for BGP as a client of this information is only to make labelled BGP path selections. ??? (Cisco): You will add link-aware path selection? JohnS: Take to list. PCE WG Chair: There is a standardised mechanism to do this developed by the PCE WG, why do we need anything further than this? A: How many are deployed? Q: 8 A: None of these protocols give a scaleable way to deploy this information, there is no reflector or server mechanism, and BGP does this correctly. Q: The reason that this works is that it does not flood, and hence scales. Alton Lo: Automatic Route Target Filtering for Legacy PEs --------------------------------------------------------- Overview of RT constrain - limiting the RTs that are sent to VPN PEs. rfc4684 deployment has a couple of problems - where RTC is unsupported on a PE (a legacy PE) the PE has to request and store all VPN routes from another RR, even where they are not required. Second problem is that the RR must advertise all RRs to the PE. Draft is the simple idea how to solve this issue - for the legacy PEs, must advertise a special set of RTs, in terms of import RTs to the RR. RR1 then install and translates these into the RT constraint NLRI so that this can then be used to restrict the NLRIs. Mechanism - the draft describes the mapping between the import RT, and the actual RT value that is used - simple mapping means that the IRT list is not imported into a VRF. Lo Wei: Question here. Suppose the network has stabilised, and and we need to configure a new VRF - right now, we do not have a message to request routes for a VRF - RR1 will not get any information about the new VRF. A: Question is if we don't have the VRF, how do we request it. We use a special VRF with some NLRI with a special RT that indicates that VRF purple would be required. Q: In the initial case, a special RT is used. JohnS: This is done with RTC, this should be clarified. Jeff Haas (Juniper): Why do we need an IRT translation mechanism? Is it not better to use a specific community to indicate that this request is part of this VRF, rather than mapping. A: The draft does describe using a well known community to stop this importing - the IRT map is there if the operator does not want to use this, then this is a problem. Jeff: Suggest that you do not do this as this is not required. Keyur Patel: Graceful Restart Extensions --------------------------- Current GR mechanism limits only BGP messages other than NOTIFICATION. Any error then results in non-graceful restart, which then results in disruption to the forwardng. So within GR, has defined that we turn on a flag in the GR capability - then both peers move to a GR mode when a NOTIFICATION is followed by GR. Since all NOTIFICATION messages are affected, then a bit is defined to ensure that there is no requirement for us to be able to provide a NOTIFICATION. When this happens the GR RIB/FIB NOTIFICATION. This draft also defines a new BGP NOTIFICATION Cease Error subcode Rob Shakir (C&W): This is very useful, and would utilise this. David Freedman (Claranet): Sub-code is utilised to indicate that this is hard down - a sub-code is then utilised. How do we keep the usual NOTIFICATION? Keyur: If we want to turn this on, then we exchange the sub-code, if you don't want it, then don't enable this. David: We lose the sub-code in the case that we are doing a hard reset. Extended Message Support ------------------------ - Current message length is restricted to 4096 bytes - we need to be able to carry more information. Primary motivation is path validation being discussed in SIDR. New capability is negotiated, extending the length of the message - meaning that we are supporting 65535 byte. Other motivation is that we have NOTIFICATION/DIAGNOSTIC that would mean that a BGP message would be transmitted back to the other neighbour. We must truncate any message where the message length would be exceeeded. Ilya Varlashkin: We do not need asymmetric behaviour as this does not make sense. Randy Bush: Original requirement for asymmetry was BGPSec - in this case, a customer may not need to sign and send prefixes upstream, therefore in this scenario does not want to send large messages. I don't are. Keyur: Reason for asymmetry - if we have problems with encapsulation and the message length is longer than the message type, then it makes sense if you want messages this way. Ilya: Why do we need this complexity? Keyur: This is a feature of turning something on in the router - does this imply symmetry is required? JohnS: Kind of with Randy, doesn't seem like a key reason for asymmetry. The reason for asymmetry is buffer memory, but in the use-case is CPE, and buffer memory is likely to be used elsewhere. Randy: Keyur's case was BGP NOTIFICATION and I don't care. Ilya: Truncate, or leave enough room - we should not let message truncation happen. Randy: The draft does not say anything about this - the draft should handle this. It's just changing a constant... JohnS: The encapsulation of the message is orthogonal to this issue - we would want to cover this when the message is max length already. Jeff H: We already truncate. The recommendation would be to send nothing longer than the maximum with some room. Ilya: But this truncation causes me operational issues. Jeff H: Recommendation should be to not send messages that would not fit in the message that you are able to send. Ilya: So I should write another draft? JohnS: It probably doesn't make sense in this one. Route Flap Dampening -------------------- No way to protect networks against badly behaving prefixes this draft recommends adjusting a few RFD algorithmic constants and limits, to reduce the high risks with RFD, with the result being damping a non-trivial amount of long term churn without penalizing well-behaved prefixes' normal convergence process - RFD has been turned off due to excessive dampening - is there a change that we can make that reduces this impact of this. High proportion of UPDATEs are due to a low proportion of prefixes - look at what this is. With different suppress threshold values on the prefixes that are flapping, then between 5 and 10K, you end up damping those a lower threshold of prefixes. Update rate is reduced by more than 20% with 4-5k compared to threshold - 2K may result in more prefixes suppressed (problem). Increasing the suppress threshold then well behaved prefixes are not damped aggressively - but those that are flapping often are damped. Geoff Huston: What's a flap? Keyur: Should I take this to the mailing list? Randy Bush: Paper and this document says that this is a hack to fix the fact that a bad constant was selected. Geoff Huston is doing work to analyse what is being seen - this is better long term. This is a request to flap this bug? John S: What's the request to the working group? Draft, not an implementation mechanism? Randy: This isn't specified, it's just a cap that exists within the document. We should fix this. There is a separate discussion that means that we should change these. ops-cabal said that they should. JohnS: Are you presenting this in GROW? Randy: No, there's a document in GROW - but this is not the same. JohnS: This is the right document to change the implementations? Randy: Also beat up to change the spec. Keyur: This is a BCP. Curtis Villamizar: Consider AS_PATH as the key to the flap, and then this helps a lot. Geoff Huston: Do we understand enough about this, to believe that a default is useful in all cases. Is the same behaviour the same everywhere? JohnS: This is philosophical - we probably don't have time to discuss. Please make sure that there's a reference to the paper in the document. Joel Ladley: Schedule maintenance events around people's flaps two to three hops away - people will run with the defaults, otherwise they would have already changed this. Jie Dong: One-time Address-Prefix Based Outbound Route Filter for BGP-4 ------------------------------------------------------------- - defines a new Outbound Router Filter (ORF) type for BGP, termed "One-time Address Prefix Outbound Route Filter", which would allow a BGP speaker to send to its BGP peer a route refresh request with a set of address-prefix-based filters to make the peer re-advertise only the specific routes matching the filters to the speaker - Some scenarios whereby we need to retrieve specific prefixes from peers - for instance where we implement 'treat-as-withdraw'. We can determine the malformed updates - so we can request the prefix for troubleshooting. For example to be able to determine where the fault is. May use this mechanism for route-recovery. - Route refresh may not be suitable for this - as this is the whole RIB, with may end up with unecessary route processing and bandwidth overhead. This presents a lightweight operational tool. - Define a new ORF type - solicit one time use case - used only as a one-time filter, and does not change any existing ORF entry. - One-time ORF may be used in conjunction with Enhanced RR, to consistency check part of the RIB. Jeff Haas: One observation - use case, if the packet is malformed, why do you trust that the NLRI is well formed? Jie Dong: This is an assumption of the error handling draft. Rob Shakir said there is more philosophical problem than the things outlined here Ruedinger Volk: Not happy with this - he would not like this information. Would like the information about what fucking problem is ? Has not seen where routers have the information about what has been advertised by this mechanism. Some cases, we have to look at the other information. Rob replied that he needs to work with operators and not vendors Randy Bush: The problem is caused with complexity - this is more complexity. The problem is that where information is logged - should it be logged in the case to the other router, or to the operator. John said that we see bunch of different drafts all of them kind of circulating on the error handling area - so would be good to take a step back and discuss this error handling area Rob also said this draft was also to trigger interest/discussion in the WG to look at this area One-Time Prefix ORF ------------------- - This document defines a new Outbound Router Filter (ORF) type for BGP, termed "One-time Extended Community Outbound Route Filter", which would allow a BGP speaker to send to its BGP peer a route refresh request with a set of extended-community-based filters to make the peer re-advertise only the specific routes matching the filters to the speaker - In this case, we have changing of local policy with specific extended communities - we may need a route refresh of a sub-set of routes. Route refresh may not be very suitable as this sends the whole table. RT constraint may not be suitable in the case that we have an unchanged set of RTs. In this case, upstream device will not re-advertise. This proposes a new ORF that allows a specific re-advertisement based on a n RT. Again, this can be used for consistency validation. ???: Can this be extended to get specific routes based on AFI/SAFI In some implementations - specific routes from special AFI,SAFI - but do not care about specific community? A: Current route refresh can get all routes within specific AFI,SAFI. JohnS: Most of the comments to the previous one should apply to this one also. RT Constrain for IPv6. ---------------------- Current RT constrain only supports prefix with maximum length of 12 bytes, not long enough for 5701 extended RTs. Add a new capability to exchange IPv6 RTs - format is shown. Aggregation rules are also defined - so only local admin can be aggregated. Other elements cannot be aggregated. JohnS: Brief comment - it seems like that with v6 RTs, it's useful to have a way to represent these. Individual hat on, this would be supported. Request will be taken to the list. Robert Raszuk: Extension for FlowSpec ---------------------- Extension for flow specification for v6 flows - flowspec does not support v6 at all. Minor extension of two definitions, and definition of a packet format header - extension of 5725. No SAFI modification, backwards compatible - AFI will show whether this is v4 or v6. This will also change in the validation - extension is there to be sure that this is being validated against the unicast SAFI. For destination and source have enabled the prefix offset - where there may be embedded addresses - for instance, NAT. Added a field for the (last) Next Header - to be able to look at the data. Would like to adopt an IDR WG document - proceeding with WG feedback, and update for the orignal flowspec RFC. JohnS: Process comment - as per previous - good comment to add v6 support where appropriate. BGP DIAGNOSTIC Message ---------------------- Couple of changes - added based on operators the ability to query based on a BGP attribute - query a set of prefixes based on the message. Type 19+20. SIDR BGP Origin Validation - this message has been split out into an SIDR draft - up to IDR/SIDR. Added draft-shakir as one of the operational enhancements Comparison with ADVISORY - of course some similarities, split in terms of being able to do DIAGNOSTIC/ADVISORY having similar issues. Merging of these drafts - some options to be able to do this out-of-band in parallel to in-band. Discuss on the list - and ask for working group option. Rob Shakir asked 'option to have out-of-band' is fine but dont lose the in-band option. A: Robert thanks and agrees to the comment and said in-band will not be removed Li: Invalid/Not Found - are these the only options? R: This is just one of the types, there are many for in and out of band mechanisms. BGP ORR ------- Option added to group the RRs together - today RR does this based on common policy, if operator wishes to group these manually, then this would be an issue. BGP OPEN message optional parameter for the Group_ID which would handle this - to ensure consistent treatment across ASes. Originally presented NH SAFI - to query client for view of the next-hop. This has been separated out into Ilya's draft. There are some cases in which this might be applicable to the Gredler BGP TE draft, which is there. Any comments - would like to request this as the WG adoption. JohnS: Please send the WG adoption request to the list. Wide Communities ---------------- - Chairs decided that there would be a functional and use-case document - so that we can evaluate the proposal without flex-communities. Added some further paramters, TTL, v.s local and well known - ensuring that people are not locally assigned. Source AS field, to show who originated the community. If there are use cases that are not currently captured - please look, and contribute this back, and the look at the deployment considerations can be looked at it. Authors would like to request working group adoption - things being worked out with the merge between Flex and Wide comms. Questions/comments? (none) Ilya Varlashkin: BGP NH SAFI ---------------------------- Carrying next-hop cost information in BGP Overview - sometimes IGP information is not available where we want to optimise the RR, or cases where decouple RR decision from the IGP metrics. Currently RR uses own view of the world, but this will supply information about costs to routers. Also has a use case where there are route reflectors - between RRs can use add-path to send all the routes - feasible to optimise to send only partial routes between RRs - could be useful in very large L2VPNs, when sending very large numbers of MACs. Non-IGP cost can also be used - can select a cost to different costs to edge 3 and R3 that are differnt to the IGP metric. This is a possible use. General interest, and can we adopt this as a working group document. JohnS: One question - is there interest? And, is this tied completely to the ORR. Ilya: Interest for cost information - is there information in general? JohnS: please break down interest to Jakob Heitz: when a router boots up since we dont have all cost info how will bestpath selection happen or what will happen in this scenario or do we wait to run bestpath till we get all these cost info. Ilya: If you do not have complete information, then you may end up with inconsistent route selection, which could then result in loops. Jakob: Do we get loops right now? If we don't have this now, and we don't get loops, then we will not end up with loops. Ilya: You can mess this up based on the change in routes. Jakob. It will mean that we slow convergence time - this will mean that we have to wait until we got this. Robert: We do not need to necessarily do this. Run bestpath with whatever you have and then run again for the delta of routes with changed nexthops. Alvraro Retana: BGP Custom Decision Process ------------------------------------------- - this draft defines a Decision Process for installation of routes into the Loc-RIB. This process takes into account an extensive series of path attributes, which can be manipulated to indicate preference for specific paths - Republished the work - -00 was published in 2002. - Motivation for this is that the BGP decision process is a linear spec - have had cases where they want recursive treatment. - Cases where they want to select an ISP based on the fact that all the other attributes are changed, but then rather than the IGP, then use one specific ISP. - This is the cost community which is an extended community which can be inserted anywhere in the selection process - this is already in IOS. Things that were changed since 2002 - defined a transitive type (at least one operator that has multiple ASes, which wants to use this across boundaries). Have a registry in the draft for IANA to handle - some of these are not attributes. In general, this is unchanged. Two implementations that are deployed - IOS and IOS XR, these are independent. Would like to ask for WG adoption. Shane Amante (Level3): This may be a twisted q - the work that is going on in SIDR, going to great lengths to attempt to re-use existing BGP attributes such as local_pref, one suggestion that is not practical within SP networks - this is in use. Question for the authors and the chairs - is this something that we could use to be able to indicate the validity of a path, or is this the right time to be able to do things usefully through RPKI. A: Specific deployment case with customers and EIGRP - key point is that this is the absolute decision criteria - we use this so that SP attributes are unchanged. Currently used PE-CE to avoid loops. John S: The answer to the question was yes. Interpretation is that this is a variable length local_pref replacement with "kind of a funny structure". Shane: Would we be abusing this to use this to indicate validity? John S: This has the semantics of preference, not validitity, which is not necessarily the same. This is your swiss army local_pref. Shane: So if we're looking at this problem, should we fix it generally? John S: Preference would be not touch the path selection mechanism again. If we standardize this then we should make sure we never again have to change or tweak bestpath. Keyur: This would not be abusing it - it's consistent across the AS. Shane A: Assumption that this is done per router - but this doesn't have to be done so - we can use something whereby we transmit this information to other routers that do not have rpki, if you want to end up with being able to have problems with distributed rpki, then you could push this data around the AS, using this kind of mechanism. Robert R: Strongly support this - two drafts that punched holes in the best path selection - if every draft changes the best path selection, then this will not be something that is interoperable - there is a requirement for this. JohnS: If you want to change this to be adopted and be extended, would strongly support it Ruedinger Volk: Not decided whether to go with this, sure know that policy cases exist. it would be nice to have the flexibility to define the criteria. It looks ugly though, how would operations would deal with this with the complexity and lots of code points. For policies that I would want to run, then there are specifics that there are specific points of exit that are interesting. Cost very definitely is not an attribute that one should carry validation state, but the policy from the router that did the security validation, could post the results of the policy into the cost - as it could with any other attributes. JohnS: this is what I want to say. It would be good to capture the comments regarding manageability considerations when considering this. Randy Bush (IIJ): start a betting pool until how long it is until there's a reverse direction signalling pool to be able to request that this is consistent across the AS? Pierre Francois: Different routers may take different paths - best path with very low value, which may result in inconsistent decisions? A: This is why we need consistent decision process. P: If the value is different at different places, the decision will be different. A: But what happens if there's not this community - what do we consider based on this? There is no inconsistency and churn. Ilya: Strongly support this in general, would like to see multiple vendors speaking the same language in the selection process. Not sure whether this is the best way to do it, or whether more information is required. More discussion required - should be a WG document so we can discuss this. Ahmed Bashandy: Scalable, Loop-Free BGP FRR using Repair Label -------------------------------------------------------------- "First stab" to be able to define an LFA for BGP. Want to be able to redirect traffic through the network without waiting for IGP or BGP to converge. Proposing that a PE advertises a repair label as an opt, non-trans attribute - if a PE decided that it wants repair then it would push the repair label. No repair label would mean that the packet is dropped. Per-CE/NH is considered more efficient, as is it is better way to be able to define this. Chose IDR as the other convergence drafts are discussed in IDR rather than rtgwg - so started by presenting this here. Keyur Patel: Same problem was present at the IETF a while back, this should be examined. This looks a cleaner solution. Hannes Gredler: Where do we want to signal that label? A: non-transitive attributes, advertise this as a prefix, add it there. Gredler: Why not in the IGP? A: well, this does not scale - not in the igp, it's a bgp prefix. John S: No, the idea here is that this can be done just for the repair with a context label: Keyur: one could very well do this, but per-NH scales nicer. Li: There are working items in other drafts that would be of interest. John S: Please send these references to the list.