IDR Working Group Meeting: March 22, 2010. (15:20-1720) Note takers: (merged done by Sue Hares) * Roberto Fragassi (Roberto.Gragassi@alcatel-lucent.com) * Daniel King (daniel@olddog.co.uk) Chair status report Slides: http://tools.ietf.org/agenda/77/slides/idr-1.pdf Susan Hares & John Scudder * One new working group document draft-chen-bgp-ext-opt-param. * Last call completed on draft-ietf-idr-link-bandwidth. Revisions will be required. * Completed recharter: Continue to make BGP scalable and reliable. (See slides for specific work items and milestones. * Scalability work is important. Internet drafts concerning scalability improvements and results are most welcome. Best Practices for Advertisement of Multiple Paths in BGP Draft: http://tools.ietf.org/id/draft-uttaro-idr-add-paths-guidelines-00.txt Slides: http://www.ietf.org/proceedings/10mar/slides/idr-0.pdf Presenter: Adam Simpson Discussion: (Dan King notes) Adam Simpson: We are Looking for comments and feedback on the draft. Rolf Winter: I did not read this draft but I am aware of other drafts [draft-vvds-add-paths-analysis] that address [Slide 6 ?] Which paths to advertise]. They discusses selection and oscillation and when to use which mechanism. Why ask the same question in your draft? Adam Simpson: We recognize there are various methods to solve the issues discussed in this draft. We believe add-path is an additional tool. We do not believe enough guidance is provided for using this tool. (Roberto Fragassi's notes) John Scudder: Area of Add-Paths extensions to BGP has been an IDR topic for a while. There already existing drafts, are you looking to cover the gaps on the draft? The use cases for the add paths work are: faster failover, routing oscillations, quicker convergence. Adam Simpson: We feel the add-paths-03 lacks detail about the use cases which are implemented and controlled by operators. The goal of this draft was to: a) introduce the concepts, b) minimize the impacts, and c) balance the benefits against the costs. This draft attempts to build many cases in an RR (Route Reflector) topology. This draft examines the: *Node Impacts RIB-IN, RIB-OUT complexity and state that needs to keep track of additional paths per prefix. * Network Impacts less churn. Additional captured discussion (Roberto Fragassi) ? (unknown) Question #1: How should BGP limit paths per prefix?? Are you suggesting globally, per peer, per prefix ? send vs. receive limits? Along with Routing consistency throughout the AS Domain.? Adam Simpson: It is better to put limits on the transmit side. (unknown) Question #2: Which paths to advertise?? How are add-paths and multi-paths related?? (Ralph ?): What does add-paths solve?? Adam Simpson: a lot of the problems we touched upon are addressed in the elsewhere. We are not saying it is the only solution. However it is a good tool and there is value added in the recommendations listed here. John Scudder: The draft Rolf [Winter] mention was presented at the previous WG session. You [Adam Simpson] should review that draft as there may be some overlap. Adam Simpson: We will look at this. We can work on the collaboration with the prior draft. John Scudder: You should look at use cases and ensure interoperability. At least, to the extent that we need documentation to make interop possible, etc. Where you see this issue, it's good for this type of document. John Scudder: I have an additional comment. Analysis and deployment guidelines are great for informative documents. Where something is required for interoperability it would need to be in a normative standard. I welcome work to identify interoperability issues. Subcodes for BGP Finite State Machine Error Draft: http://www.ietf.org/id/draft-dong-idr-fsm-subcode-00.txt Slides: http://tools.ietf.org/agenda/77/slides/idr-2.pdf Presenter: Jie Dong Discussion: Jie Dong: This draft is split into two different parts: BGP State Machine FSM sub-code additions and TCP errors (sub-code 4). We are discussing just the FSM error codes. John Scudder: Is this a problem the WG would like to solve? Enke Chen: In the past I have worked on implementations and the specification provides a lot of flexibility. In my experience its infrequent to have FSM errors. How often do FSM errors actually occur and how many implementations care? Jie Dong: The problem may not occur that often but it may happen with new implementations or BGP extensions John Scudder: (With Chair hat off) if we have a notification type it seems reasonable to have sub-codes that may provide more information. If an implementation does send a notification it can provide some additional information. If your implementation does not send an FSM then the point is moot. Andrew Lang: How often does actually happen? Most BGP implementations are fairly mature. If you have an FSM error then there are some serious issues with your implementation. I agree with John [Scudder] that if additional information is available it is useful to have and use for troubleshooting. Enke Chen: How can you be sure that the error is not due to the local implementation [the local speaker complains about a FSM error]. What would be the cause of the error? Is it due to a local issue or the remote? Susan Hares: So just to clarify. There are multiple FSM items for clarification; the first is the definition of the FSM error in the original state machine has no sub-code. If the sub-codes are need for other types of notification-ceases, then these codes are useful in the state machine. Enke Chen: Let me give an example. I expect a message in this state. Ok, I have not moved my local state to the established state but I received an update. It could be that the remote is wrong or it could be locally. If you have multiple threads perhaps you are not processing messages correctly. I really wonder if this [reporting FSM errors] can be achieved correctly. John Scudder: Personally I think just like all errors in the BGP spec, you deal with errors to best of your ability and you send the relevant error messages. You could say exactly the same thing as an update error. If I sent you a malformed update notification; was the update really malformed or did you parse it incorrectly. These issues are intractable, you just deal with them as best as you can and move on. Pradosh Mohapatra: Regarding the TCP message issue you mentioned, I was wondering if you considered the advisory message. Unknown: I agree that the TCP example looks more like an implementation issue, having said that it does make sense to have the FSM codes mentioned in this draft. Albert Jin: Is this a lab environment and debugging messages requirement or is this a field requirement? Susan Hares: Are there additional comments? We do not have a full agenda so we have time. Albert Tam: This is a question for the operators. Is this an error that we are expecting to see in the field, or is this for lab or development? If it is for the lab you can troubleshoot with debug messages. For the field then I can see the usefulness. Ruediger Volk: You called for the operators [Ruediger is from Deutsche Telecom]. I do not remember seeing any case where this is likely to have helped. Not all the issues are reported back to engineering and analyzed. There may have been cases where this was useful. I do not see any urgent need but on the other hand having more information for failures are helpful. Concerning Enke [Chen] question, in a distributed system you are never sure which side causes the problem or if it is just misinterpretation. I am not sure if in the next decade I see this being useful for my network. Susan Hares: Jie [Dong] did you want to give any input from your providers in China? Jie is bringing this forward because some of his customers have Greenfield networks and they have had these problems. So this is a day-to-day issue. Enke Chen: That's fine if people have these problems. As John [Scudder] mentioned if you have more information then its fine. I have one comment; maybe you need to take care of graceful restart. Susan Hares: Since we are working towards a long-term BGP standard, is the graceful restart FSM state sufficient in the document? If the state machine write-up is not sufficient please let us know. John Scudder: We had not request for adoption but it seems like there is interest in the document. From the comments on the mike there is interest in the sub-codes. For the TCP errors there was really not a whole lot of feedback on pro or con for TCP errors. ? How many people have read this draft? [A few]. I would recommend you read it and consider the TCP functionality and provide comments on the list. Pradosh Mohapatra: Oo the operator feedback, is this an issue from the operators? Ruediger Volk: Regarding TCP problems I have seem problems with home bake BGP. Any additional diagnostic info is helpful can be helpful.