Notes from the MPLS WG meeting In San Diego Loa - agenda bash ----------------- As usual the agenda is quite packed. This normally lead to that we drop a couple of presentations at the end. So *please* try to keep to time and allow time for questions. Loa - status ------------ OAM reqs for P2MP now RFC. MPLS GR in RFC ed queue for long time. now moved from miss ref to ref - so all refs in ed queue. Number of docs in IESG review: Need to make sure that requests for new revs are understood and why. MPLS over L2TP has all discusses resolved. So now on Ross's plate. Back of 3 LDP drafts for draft standard: Old survey will be published. issued new survey - been over responses and convinced have all features in protocol implemented and tested by at least two implementations. Had closed survey but please send implementation reports in anyway if you have them. Ina - please look at last version of draft with comments from IESG. 3 expired docs all on upstream label allocation. Asked someone in same company as main author. Author on honeymoon so will be back to update later. status of ICMP draft. traditional update - stuck on draft in the int area so need to ask int area when that's coming along. doc can't change, but can't progress, so keep updating it... George - can you all read the ICMP draft and comment as to whether we keep moving it forward. JP - number-0-bw-te-lsps ------------------------ lots of comments from Adrian and few others (e.g. Acee Linden on OSPF v3). also one on security issue. will issue new version next week then be done. Loa - one more comment (about applicabillity) JP - will write up to address that comment. Slides: no slides George - proxy ping (was "remote ping") --------------------------------------- issue here is P2MP trees. 1) if have 100s or 1000s of endpoints and ping then tons of nodes have to process it - even if just to see they don't need to respond. So able to avoid overwhelming the guy at the head-end, but overall network processing can be very heavy. 2) if add previous hop info from mLDP then becomes good tool for tracing backwards. 2 extra messages: proxy request and proxy reply (reply to proxy request not to ping) request is request for proxy ping. send to a specific node who then pings down the remainder of the tunnel. Not much use for CCAMP type media, but in packet world can "sneak stuff in" mid-way down a tunnel. so don't need to bother as many nodes with info as with ping from head end. proxy reply can be asked for to verify that the guy you sent to is actually sending a ping (so you can differentiate between ping not sent - e.g. as you're not authorised to send it to me - and no replies to the ping itself). can also supply PHOP info for backwards tracing. P2MP-TE tracing. root usually knows topology. so if know something's failing and roughly where you can ping just down that area. Can also trace by sending pings from differnet points - so even if don't know topology can learn it from response to each message. scoping - if you actually know the next-hops beyond things you can say "I'm only interested in you sending towards this particular set of next-hops instead of to everyone". mLDP tracing - nice tool for these receiver-initiated tree. so receiver wants to know if getting traffic. you can trace backwards by sending proxy ping to node upstream of yourself and asking it to ping towards you. can also ask for upstream hops and then send a proxy ping from there. is there any interest in this? And can we discuss on list if should become WG item. Hum vote? Seems a fair bit of support, no opposition. Slides: Proxy Ping [1] Adrian - P2MP extensions to LSP ping ------------------------------------ ovelaps with what George talked about. Aim is to reuse LSP ping as far as possible for P2MP. Doc has motivation etc. in it (not required but good context). Works for P2MP-TE and mLDP and does ping/traceroute. Many issues/challenges in terms of number of leaves - draft explains how to address them. mLDP added in this rev (previously P2MP-TE only as mLDP was still at early stage). Taken material from George's mcast-cv draft (so George/Tom are now co-authors). 3 other changes - bootstrapping section is just discuss, not protocol. new stuff in echo request is mLDP FEC sub-TLV. since all leaves send echo response overload is a big problem, so draft explains (via new egress identifier TLV) how to avoid overload. note that mLDP ingress may not know the leaves. retained jitter mechanism for responses (to avoid overload). as added mLDP have issue for traceroute that if you don't know the destination then you can't trace to it! (so don't apply this to mLDP). Also issue with LDP that transit node doesn't know where it is in the tree. believe main function is stable. bootstrapping also there. but few other things from George's work e.g. to allow multiple downstream mapping TLVs, generalising target of a request (why should we only be able to ping the leaves?), maybe add a skeleton mode where only key nodes (branches/leaves) in the tree respond, ensure legacy implementations can interop with new codepoints. want to push out new version this year then we can push for implementations. Kireeti - seems that most presentations now are tutorials. now people read the draft to the audience. Am I the only person to notice this? Adrian - any Qs on material of this draft? Slides: IETF67-MPLS-P2MP-PING.ppt [2] George - MPLS P2MP OAM ---------------------- failed to get this in on time for Montreal. Since then have been talking to Rahul, Dave Oran, Dave Katz on using BFD instead of new msg. Dave Katz wrote a good first draft but wants to do more polishing and went missing in action and uncontactable for last 4-5 weeks (so his draft never got submitted). Dave Ward - I found Katz, and we expect BFD portion to be submitted any day now. this draft is rewrite using multicast BFD (so more like UFD!) many P2MP-TE apps will need heavyweight OAM (high revenue, high risk apps - major downside if drop TV feeds to 1000s of subs, or deliver ticker info to one brokerage before another). So high effort acceptable for monitoring... e.g. interest in egress repair within 50ms. So aim here is that egress can monitor feed to determine if it's alive and take appropriate actions to switch quickly. must scale to 1000s of endpoints. builds on MPLS BFD, bootstrapped with LDP ping to create binding from FEC for LSP ping and discriminator for BFD. request may include info on what you want the receiver to do as an alarm action. also includes admin down field so that you can take particular leaves out of service (if you used BFD you'd take the whole tree out of service). required retransmit interval has reserved 0xFFFFFFFF value to say "don't respond at all to this BFD packet". discrimination has to be based on source discriminator rather than dest. Initial draft that Dave sent out needs a change to say clearly that using source discriminator not dest. if you want unreliable notification then can jitter. if you want reliable then keep polling or send unicast polls to nodes that don't respond. this is 1st draft, but like to see what interest level is and whether can become WG draft. Dave Ward - one more detail. When you send BFD you send from source to branch point then branch to branch then branch to leaf (i.e. hop by hop) not source via branches to leaf (i.e. end to end). So that's to avoid the massive amplification you get if you send end to end - i.e. don't overwhelm the network with OAM. George - not sure we'll meet the response time that way. Dave - remember that branch may see multiple sources. but if branch is singly attached then you'd be hosed if it lost that source. Also note that the BFD spec will change so have asynchronous branch mode session, so can create state machine but not reply unless you get specific message asking for reply. so will be head driven but don't necessarily need to send polls with reliable communication. Toerless. Seems as if we're sticking to a single P2MP tree and you want BFD liveness for that. If I just take BFD from source and replicate along tree vs doing hop by hop termination and resending then what would be different at the tail end? Dave - point is that depending on topology you may not want source to send all the way to the leaves. George - what I need is the option to do it all the way. Toerless - so what's most simple example of where hop-by-hop is different from replication in the tree (except for the fact that you don't actually test the tree)? Dave - we need to show those topologies in BFD WG. Yakov - we need to show semantics of BFD when terminate the session at each hop. if doesn't follow same path as data then it will be useless. George - agreed but this isn't the BFD WG so let's get the draft out and discuss in Prague (where we'll ask for it to become WG). Slides: Point 2 Multi-Point OAM [3] Tom - extensions to RFC4379 in support of link bundles ------------------------------------------------------ really just link bundle extensions (not 4379bis). straightforward problem - 4379 won't work correctly with link bundles. Most implementations include link bundle rather than component links. for troubleshooting it's nice to know which component links are in the bundle and how they're hashing. Just knowing the bundle is broken isn't always enough. hit this in a real network about 6 months ago. result was two hours troubleshooting time - so figured out needed a way to get more visibility. Need to be backwards compatible with current LSRs and must support variety of load balancing techniques for bundles (just like LSP ping supports multiple mechanisms). overview - extend LSP ping echo request with link bundle TLV. decided to keep virtual interface in DS MAP TLV. Added new DS MAP TLV for bundles. Can associate that to the DS MAP interfaces. also altered algorithm for processing DS MAP at source and midpoints. Midpoints compose link bundle interface and component links appropriately. receiver gets echo reploy with DS MAP TLV as today and will iterate over it for components. Older implementations ought to ignore link bundle TLV without issues. Draft needs additional editing. Need sense of WG as to whether this is a good starting point and if it should become WG doc to satisfy the charter item. Mustafa - you changed the downstream mapping TLV. New revision number for echo reply msg. Tom - just new identifier type to say you've got a link bundle. So do as today, but say "link bundle" instead of IPv4 etc. not many have read the draft. Most of them think is a problem we need to solve (possibly because they were the authors!) Want to take it to the list... Slides: draft-nadeau-rfc4379-bis-link-bundle-00.txt [4] Tom - detecting MPLS data plane failures in inter-AS/provider scenarios ----------------------------------------------------------------------- presented a couple of meetings ago. draft expired, was forgotten about, resurrected. Problem is how to handle inter-AS and inter-provider scenarios with LSP ping. problem is that 4379 sends echo requests. assume that echo replies can get back to source. not always the case in inter-AS/inter-provider (loopback routes may not be distributed from one AS to another). need additional info that lets P routers understand where there'e a relay point that can get the packet back to the source. so new TLV which keeps a stack of ASBR addresses (each ASBR tacks itself on as it processes the request). JL - need a few clarifications. Is the new TLV processed during ping or traceroute. During ping the transit LSR will process the packet - so no longer exercising the data path! Tom - good question. The idea is that this happens in normal processing. But packets may stop at ASBRs anyway. Could play with TTL to make that happen. JL - maybe you could start with traceroute to record all ASBRs and then do pings and add list of ASBRs so that when a failure occurs the router detecting the failure can use that list of ASBRs to get back to the source. Tom - had thought of that but issue is that topology may change between sending the traceroute and sending the ping. Mustafa - if you're a P router detecting a failure than how do you know you need to reply? Tom - follow today's rule. Mustafa - but how do you know you need to respond to ASBR? Tom - ASBRs stamp the packets as they go through. Idea is that TTL is used to make packets expire at ASBR. Not optimal. Mustafa - but could flood ASBRs Tom - hopefully if sending messages won't bombard ASBRs with loads of messages Ina - for ping or for traceroute? Tom - both? Ina - why do this on ping? could just have ping not replying and then do traceroute to see where the problem is. Tom - generally issue is you have to timeout a few times to figure out it's not a false negative. But sure, you could traceroute - assuming other providers let you trace inside their AS. Ina - tradeoff is load on ASBR vs benefit. Tom - one thought is could tag messages in special way so ASBRs could optionally ignore them or just forward them along (rather than building TLV stack?) if you have address to respond to then everything works fine. So if could just tag packet in certain way then ASBR wouldn't fill in address, and just works. Greg Minsky - purpose of having ASBR info is to ensure follows original path? Tom - no, just ensures you have a way to get back to the originator. So ASBR could even stamp another ASBR's address in there. Is just proxy who knows how to get back to the source. Provides "trail of breadcrumbs". Greg - can't you just resolve with BGP policy. Tom - but B router doesn't have BGP Greg - P router has exit point whether single or multi-homed. eBGP speaker can get back to source. Trying to come up with a solution for a problem that can be addressed another way is odd. Loa - discussion at risk of becoming unproductive so let's take to list. Slides: draft-nadeau-mpls-interas-lspping-01.txt [5] Tom - LSP ping bis draft? ------------------------- issues I want to address on LSP ping are making me think we may need a bis draft. e.g. one complete contradiction in the doc on various types of downstream mapping info. Also found errata process has become considerably slower than the updated RFC process. Errata would be correct solution if it wasn't broken! Also have formatting errors etc. One technical change - high order bit of an object is used to say if is mandatory (and send error if don't understand) or optional (and drop if don't understand it). That was good for ping when saw as point-to-point message and could iterate, but for P2MP things you have a means of extending things for a single transaction that doesn't apply to a situation where a bunch of the nodes understand the options and others don't. Don't want to get error messages from all the odes that don't understand a new option. Idea is to use another bit to extend semantics. Adrian had another idea for an extension. Do we think this is a good idea? No response - so Tom will write draft. Adrian - P2MP-TE MIB -------------------- became WG doc since last meet. Based on existing RFCs with extensions to TE MIB but none required for LSR MIB (as was built with ability for MP2MP cross-connects at the LSR). Text will be added to show how to use LSR MIB in P2MP cases. should get implementation feedback soon, but need more review before it goes forward (don't want people to implement it unless it does what we want). Loa - forgot to mention earlier that P2MP-TE draft sent to ADs. Slides: IETF67-MPLS-P2MP-MIBs.ppt [6] George - RSVP Generic Error --------------------------- RSVP takes a minimalistic approach to error reporting. Good philosophy in general, but in labs etc. it's nice to have a bit more info. Idea here is to add a means for developers to add "generic errors" and then their own optional objects with user error specs (using enterprise/org numbers etc. to make them unique). Also can be useful when doing interop testing, but idea is that don't go in production code (though harmless if they are in there). next step is to change from OUI to enterprise numbers. is it a good idea? Slides: RSVP Generic Error [7] JP - Node behaviour... ---------------------- remember soft preemption? Provided notification to head-end rather than just tearing down tunnels. so to define soft preemption we had to define hard preemption. some people were confused as to default behaviour for preempting a tunnel. This draft doesn't define new protocols, but is BCP to document how a node generating/receiving a PathErr ought to act. as stated in 2205 the PathErr doesn't impact LSP state - but just forwarded to head end. in this doc we want to define fatal/non-fatal errors. fatal error - detecting node sends PathErr and clears LSP state. non-fatal error - sends PathErr but no change in LSP state receiving node never changes LSP state. Proposing a column in IANA registry defining if errors are fatal or not. as far as we know this complies with most implementations today... Can everyone please check that their implementations agree with this so we can reach consensus? We'd like to get WG agreement to document this. also quick consensus on error values 2 and 5. then we can move forward on soft preemption ID. Also will need to keep the new column in the registry up-to-date. Adrian - I'd like people to understand that the draft is a statement of what we think is going on out there. Need other vendors to agree that this is what's real. Loa - WG already decided we should document this. the issue here is rather that we need vendors to review. JP - will poll list to check all in sync. Lou - is there agreement on behaviour for fatal error? JP - LSP goes down. Lou - but are you specifying the behaviour. So proposal is to change registry. But is it clear what to do if you get TE failures? JP - e.g. if link failure you can generate PathErr Lou - are we documenting what ingress then does etc.? JP - no, just documenting what detecting node should do. Lou - so when get fatal path error should flip "path state removed" bit? Am I missing something basic? Adrian - I think that Lou is asking are we documenting the protocol behaviour for soft and hard failure cases. That leads on to soft and hard preemption. My view is that a "yes" in this column documents protocol behaviour, and also nodal behaviour. JP - that's in the doc. Slides: IETF-67-MPLS-PathError.ppt [8] Bob - LDP capabilities ---------------------- this is joint draft with Shivani/Rahul/JL. history is that prior to Montreal we had 2 drafts. Very similar so agreed before IETF to do joint presentation and then to merge. that's all happened now. motivation is that we've added various enhancements to LDP. LDP lacks any framework for managing use of enhancements. this draft defines a capability advertisement mechanism and a way of enabling/disabling them. points important to recognise: 1) annoucement of capability documents what speaker can receive/process - not what it can send 2) allows speaker to tell peers if it supports deviations from 3036. e.g. at initialisation time a GR node can inform neighbours that it will retain label state when LDP fails. not every enhancement needs a capability. each speaker may have 0 or more enhancements to 3036 enabled. If enabled speaker will perform associated actions. LDP defines notion of optional parameters in LDP messages. all parameters are TLVs. So this draft lets speaker announce its enabled enhancements by advertising capabilities in init msg. draft also specifies particular enhancement of ability to handle dynamic capability advertisements. Purpose of that is to make it possible for speakers to enable/disable enhancements by advertising/withdrawing capability after init. some more info in draft - would like to point out that draft specifies common encoding for TLV used to advertise/withdraw capability. has common part which includes bit for advertisement/withdrawal. subsequent to submission of draft authors plus Yakov have had further discussions, so will issue new version with those decisions in. At Montreal concept seemed to be accepted. What do people feel now about it? Slides: ldp-cap-IETF_20061107.ppt [9] Bob - LDP typed withdrawal -------------------------- as noted in Montreal the wildcard FEC isn't useful. Yakov noted that if there's a problem then we need to fix it! Ina and Bob wrote draft showing difficulties with wildcard FEC as defined in 3036. two deficiencies: 1) specifies all FECs regardless of type. was OK in early days of IPv4 prefixes only. Good to be able to do FEC type specific wildcard. 2) only allowed in withdraw/release. can't remember why we didn't allow it in label request. could be useful there - e.g. mLDP (next revision of spec will use typed wildcard for request). so this draft introduces the typed wildcard FEC and allows its use in label requests. also specifies the typed wildcard FEC element for the prefix FEC (since need to identify the address family). needs a new IANA codepoint. one thing to say is that need to decide if FEC element has to support typed wildcard. Wasn't up to authors to decide that for FECs, but up to the designer of the FEC. also need to ensure backward compatibility. Non-issue as 3036 says that if you don't understand a FEC type you should ignore it and send a notification. done a reasonably complete job of specifying this. final issue is error handling. If a router implements this and receives a message that includes a wildcard FEC for a FEC type it doesn't support, or for which wildcarding isn't supported then action is to respond by sending a notification with unknown FEC status code. we think spec is reasonably complete. if not then please let us know. we think doc addresses an issue that requires a mechanism not currently in 3036. Loa - how many have read it. Answer is not so many. So take the dicussion to the list... had pretty good support when started so need to push list to get more opinions... Slides: ldp_typed_wildcard_IETF_20061107.ppt [10] Bruno - inter area LSP ---------------------- problem statement. MPLS VPN networks are expanding and MPLS backbones are growing (density and footprint). So more IGP areas being introduced and need inter-area LSPs. LDP won't set up inter area LSPs if addresses are aggregated. two solutions today: 1) leak /32 PE routes. Scaling issues here. 2) MPLS hierarchy with MP-BGP between areas. But now need BGP on ABR, and have issues with BGP convergence if ABR fails (and can't do LDP FRR if ABR fails). so extension here allows LDP to do inter-area LSPs even with IP aggregation. So now does longest match in the RIB instead of exact match. FECs selected this way will be advertised in ordered mode. other FECs can be independent or ordered. v1 presented in Vancouver. this v3 has problem statement/editorial changes, but tech spec is stable. in summary - this is straightforward, minimal changes. Eric - I'd like to express issues here. The reason LDP requires an exact match is so that paths won't differ from those in the routing algorithm. Once you allow longest match then there's a risk that the IGP and LDP paths could differ. Your draft says "don't do this", but need a precise statement of conditions under which LDP may put in/take out entries from the forwarding table based on various routing events. Also I can't shake the feeling that this calls for a hierarchical solution. Sure, IGPs aren't constructed to carry 20k or 50k /32s. BGP is possible as you say. Don't want to run BGP as turns ABR into ASBR. Not clear we can do hierarchy using off the shelf tools without creating AS boundary. Also issue you have with FRR. Not sure we have very standard LDP-based FRR today that will repair around an ABR. Presumption is that could work around ABR but not around ASBR. Would like to see that addressed before becomes a WG doc. Loa - you're thinking on similar lines to me. could you write down the concerns and send to list to initiate discussion? Eric - yes Loa - want to cut off here and take to list. Slides: LDP extension for Inter-Area LSP [11] JL - P2MP MPLS-TE Fast Reroute with P2MP Bypass Tunnels ------------------------------------------------------- background is that P2MP spec defines extentions to facility/detour to do FRR of P2MP-TE LSPs. only relies on P2P bypass LSPs. so PLR has to replicate traffic during failure and may result in inefficient b/w usage. may have 10s of LSRs downstream of protected node (have a case with 30 LSRs downstream). to overcome this limitation while using MPLS hierarcy have extended facility backup to enable P2MP backup tunnels (can protect several primary LSPs and will tunnel traffic in P2MP LSP towards all merge points using a label stack). inner label is backup LSP label, new outer label is P2MP bypass tunnel label. (Issue is that need upstream assigned backup label since each Merge Point gets the same backup label). Merge Point needs context-specific forwarding table for the corresponding P2MP bypass tunnel - so will lookup the backup label in that context. have received good comments on-list and offline which need addressing. draft assumes primary LSP protected using a single bypass tunnel, but P2MP LSP could be protected by combination of P2P and P2MP bypass tunnels (useful if lets us use pre-established P2MP LSPs). But remember in multicast there is always a tension between replication and state. could use P2MP bypass tunnel whose leaf LSRs are a superset of the protected downstream LSRs (again tension between optimisation and state). Also need procedures for LAN interface protection. Will ask for WG adoption in Prague. George - I wanted to do this in base P2MP spec but left out because of delays on upstream allocation (thus always planned as followup work). One Q - is there any place where we can explicitly signal that we want the last hop to indicate something other than PHP? Ought to be able to signal that. JL - the LSP stitching draft requires that PHP must be deactivated on segment. May need a more generic flag to disable PHP. George - agreed. Also draft needs to describe a few more scenarios. Toerless - to me there is a lot of complexity resulting from scalability/co-ordination issues. Can we use high availability mechanisms instead? Do we actually have all the graceful shutdown procedures etc. for that? The graceful shutdown soln should be simpler. JL - agree that graceful shutdown and FRR may interact. But in core FRR is better as it's hard to diffentiate between control plane and data plane failures. Unplanned node failures are much rarer than link failures, but we do have unplanned node failures and need to cover with e.g. sub 100ms for TV. Toerless - would be nice to document what failures can only be solved using this mechanism. JL - have same issue for P2P link/node protection. We want same availability for TV traffic as unicast traffic. Toerless - in general slides don't answer scalability impact of this. Lots of impacts here. Mustafa - why would you want such a complex mechanism? Could get head end to reoptimise tree, so replication is temporary. JL - in my operational figure this will lead to 30x increase in link traffic, so FRR will congest the link. Mustafa - we could look at static bypass that does P2MP. Don't try to signal on the fly. Just create and make available. For stuff like TV we don't need much dynamicity. JP - the way you compute the backup path is irrelevant. Dimitri - what detection mechanism do you use for failure. You can have P2MP failure without node failure. Need to check all cases covered here with suitable detection mechanisms. Don't want a partial solution. JL - don't see anything different to P2P case. will use same detection mechanisms. So you will activate this regardless of how many LSPs effected. branch point is branch point for multiple LSPs. Kireeti - quick comment (echoing Toerless). Node protection in general is complex and less scalable than link protection. Let's step back and ask SPs to tell us how often unplanned node failures occur. Should we continue down this path of making node protection work in increasingly complex cases. Slides: P2MP-FRR-1.0.ppt [12] JL - IGP Routing extensions for discovery of P2MP TE LSP Leaf LSRs ------------------------------------------------------------------ v2 of this draft. Problem statement is that e.g. if doing TV broadcasting with regional sources you need sets of P2MP-TE LSPs with distinct ingress LSRs but common leaf LSRs. could manually configure on each ingress LSR, but is prone to misconfiguration and doesn't allow dynamic addition/removal of leaves. so would be good to be able to discover leaves. This draft only automates that. Does not address interaction with multicast receiver activity. makes sense to rely on IGP for simplicity. Especially when BGP-free core. various changes. want SP feedback on problem statement and WG feedback on soln. want to be WG doc. Slides: AUTOLEAF-1.0.ppt [13] Luyann - MPLS/GMPLS security framework -------------------------------------- why do we need to do this? several CCAMP drafts got stuck because security ADs questioned our understanding of security (esp inter-AS, inter-provider). so CCAMP and MPLS WG chairs decided we needed to fix this. aim is to have one doc as security framework for MPLS/GMPLS that other docs can point to (and then point out any specific issues that they have). will be informational, and will apply to anywhere MPLS or GMPLS is applied. the aim is to form a DT and do a draft before next IETF. outlined draft. can't outline general IP security as already done elsewhere. Just specific MPLS/GMPLS issues. draft is not a solution draft, but may mention solutions you can use as best practice. More work may be needed (e.g. on label spoofing). current DT members listed. Yaakov - you could look into work coming from PWE3. People there were asking if we should work at PWE level or MPLS level. I'm willing to join your design team. Luyann - good to put all general MPLS/GMPLS stuff under this umbrella. Peng - have you talked to SIDR group. they have similar ideas/requirements. Luyann - SIDR is more general. This is MPLS/GMPLS. but we'll look into it. Welcome people to join DT or to review. Loa - anyone interested contact Luyann. Slides: MPLS GMPLS Security Framework.ppt [14]