July 12, 2006 Minutes of the MPLS WG Monreal, Canada ======================================================================= T-MPLS liaison - Loa ------------------- The work is occurring in Q9 and Q12 of SG15. IETF was invited to the Ottawa meeting last month (as was MFA - who pretty much supported IETF view). 3 IETF reps. IETF hijacked agenda (2 days discussion instead of planned afternoon). Good progress. A couple of liaison statements have been exchanged between IETF WGs and ITU-T SGs on T-MPLS. (Copies on the data tracker.) MPLS WG had four major comments: 1) Requirements. ITU-T responded that their requirements were implicit. Loa's view is that it's hard to agree reqs unless they're written. but at any rate that's the ITU-T's process, though they have promised to write them down and send them to us in this instance. 2) PHP. Long discussion: Move from "do not support" to "do not use" is a step in the right direction. MPLS issues with PHP don't occur as P2P Ethernet only in first phase. ITU-T accepted that PHP can be useful! (to avoid multiple popping). Acceptance that will be used in future versions (to do MPLS services over T-MPLS). 3) ITU-T agreed to use MPLS Ethertype - makes IETF the design authority. Agreed to use (G)MPLS change process for extensions. Adrian will do something ASAP so we can forward the doc to the IESG. 4) Agreed not to reserve labels in advance but to come to IETF if ITU-T need any (as they did to get label 14 for Y.1711.) PWE3 liaison - stated that Y.1711 will be used in first version. Motivation is probably that it's all they have. Will revisit in future (in cooperation with Q5/SG13). We need to respond to outstanding liaison on G.8110.1 by Aug 1 so please read/comment. WG chairs for MPLS and PWE3 will create a response jointly. P2MP-TE - Rahul --------------- Brief update on P2MP RSVP-TE draft. Draft has gone through last calls etc. Going through major comments, and showing how they're being addressed. 1) Clarifying scope of P2MP ID as root node scope. Need to drop one comment and replace by another. 2) Semantics of P2MP ID. Removing comment. 3) Tunnel ID semantics as in 05.txt. 4) Procedure for fixing re-merge has two options (data plane based and control plane based). Data plane one wasn't clear enough. So rewording to explain that PathErr has two sets of objects. 5) no change to wording on interop of two re-merge approaches. 6) some other clarification. So 06.txt will be posted to incorporate consensus. Will then ask for comments on delta between 05 and 06. P2MP-TE MIB - Adrian -------------------- Rule is that if you have a protocol then you need a MIB (useful rule!) Already have MPLS TE MIB. Protocols built on TE protocols. So trying to reuse as much as possible. Turns out that no changes are needed in LSR MIB as it allows arbitrary cross-connects. Need to explain how to use the LSR MIB (easy). Do have to be careful with bud nodes (drop and continue) to make sure the text makes it clear how those should be treated. TE MIB needs more work as it models the protocol rather than the switch. Need to redefine interpretation of a couple of objects. Key one is mplsTunnelEgressLSRId. Also extensions for extra info. Adrian presented a diagram showing changes to MIB. Change is that some info now needs to be per-destination rather than per tunnel - so need to go via an extension which has a table per leaf which can then point to EROs/RROs. Also can do performance per branch. Outstanding question on modeling sub-groups (done in protocol). If so then slot between tunnel table and tunnel dest table. Not modeling protocol perfectly (as has ERO and set of secondary EROs, but we're putting it all in the same level as destination - probably the right thing to do as at config time you don't know which will be the primary ERO). Next steps include getting operator input on whether this is correct plus review from MIB experts and those who did the protocol. Technically we need a MIB for any WG doc at the time that protocol goes to IESG for review. LDP Capabilities - Bob/Rahul ---------------------------- Rahul ----- 2 overlapping drafts - one from Bob Thomas, one from Shivani/Rahul/John. An agreement to merge drafts was made among the author prior to the meeting. Key issue is that LDP has been enhanced (e.g. GR, FT, upstream label assignment). Until upstream draft there was no mechanism in LDP for managing the use of LDP enhancements. May be future enhancements down the road. So capabilities are associated with enhancements. It isn't the case that all enhancements need to advertise a capability. Upstream label does. But need a mechanism for peers to negotiate the capability at session establishment time (or any time later). Allows a peer to activate an enhancement by activating the capability. Have leveraged the mechanisms used for BGP capabilities. Loa - what's the delta between the drafts, and how do you solve that? Rahul - 2 problems. Advertising capabilities at session establishment and doing so later without tearing down the session. Up to WG to provide feedback on that. One draft does both, one only does session establishment. but WG may say they only want to solve one problem. Bob --- Plan is to have a draft for each enhancement, and that if the enhancement needs a capability then that would be documented in the draft. IANA would assign a code point for that capability. Assumption is that each LDP speaker implements a set of capabilities (may be empty set). if a capability is enabled then speaker will perform associated actions. Enabled capabilities are advertised at session establishment time. One capability will advertise that speaker can do dynamic capabilities. If enabled then at any time later one or other speaker can enable a dynamic capability. There's an acknowledgment mechanism for dynamic capabilities to ensure that message is processed with the same capability state as it was generated (same mechanism used for dynamic BGP capabilities). Also new capability message defined (for dynamic capabilities). Optional overall, but needed if supporting dynamic capabilities - and only sent if both speakers advertised dynamic capabilities during initialization. So next step is to produce merged draft with detailed procedures. Will then enable a new version of the upstream draft to be written with a definition of upstream label assignment capability. George - my only comment is that I wish we'd started this a few years ago. Loa - do we have a routing area AD here? Don't seem to. But well within charter, especially as have taken on upstream work. IANA registry for flags field in RSVP session object - Adrian ------------------------------------------------------------- issue is only have 8 bits. Registry was originally in RFC4420 (LSP attributes). Out of scope for that so putting it in a new ID. But may be other registries that would be useful. But no point sitting on this work while we figure out what else we need to do. Now a WG idea - Loa gave permission last night and Adrian submitted last night. So Adrian's view is we should just last call it. George - I think I should have equal say in this ;-) But no objection in this case... Updating upstream labels - Eric ------------------------------- This is an open issue in upstream label draft so I want to brainstorm on a solution. We know that we want to use Upstream Assigned labels on multicast data packets. Easy to do on P2P or P2MP media as you know who put the top label on. Problem comes with MP2MP media such as Ethernet or MP2MP LSPs (work in L3VPN on the latter). In theory could use L2 header info (at least on Ethernet) but not such a great idea as changes the forwarding logic to lookup the MAC source. Also want to avoid operational impact. Solution is a context label - so push on UA label then a context label to tell receiver who put the packet on. So context label lets you identify table to lookup UA label. But problem is that context labels hit the same problem (as they would be upstream assigned as well) - though there would be fewer of them. Loa - can't you assign them downstream? Eric - no as this is MP2MP. options 1) Take low-order 20 bits of IP address. Probably unique in the required scope. Certainly in Ethernet is useful as routers are likely to be in one subnet. But in MP2MP LSP might not work as no presumption that will be in common subnet. 2) overflow into a second level stack entry. Transparent to forwarding path as downstream doesn't need to know where label ends (can still use first label to look up into a second table). If each node needed <= 256 UA labels then could work as we have a total of 40 bits (32 for host, 8 for label). Or we could use low order 24 bits (probably unique in SP network) and then get 64k UA labels per node. 3) use random number. Seems strange. Would need collision detection/resolution. But turns out that probability of collision is low given 20 bit space (would be even lower with 24 bit space). Also since all nodes don't have to choose simultaneously you could avoid any numbers that had already been picked. A very long discussion ensued without conclusion. To be continued on the list. Giles' raw notes on that discussion are attached at the end of the minutes. P2MP LDP requirements - Jean-Louis ---------------------------------- Since Dallas, the draft has become WG doc. We added req that leaf add/remove mustn't impact data transfer towards other leaves (important), detailed other reqs, removed controversial text on routing protocols (used to say shouldn't require deployment of new RP - now says solution should have key objective of minimizing additional state/processing in network), detailed reqs on MP2MP LSPs, incorporated comments from Ben Jenkins, other clarification edits. P2MP rerouting. Quite challenging when reroute isn't due to network failure - since we want to avoid packet loss/duplication during reroute. Hard to do. always tension between loss/duplication in connectionless schemes like LDP. So req is to minimize loss/dup and be able to avoid one or the other (but not both). Ideally let operator choose between loss and duplication. MP2MP LSPs are optional as there are other approaches at application level. At any rate reqs for P2MP also apply to MP2MP. Especially must support transit node failures including root node failures. Also must avoid routing loops that cause exponential growth of traffic. Challenge here is that LSR may receive traffic for an LSP over multiple interfaces. Also added some reqs - e.g. not receiving traffic back that send on MP2MP LSP. So draft is now pretty stable but need more feedback - esp on new changes. Plan to update draft to account for comments by end of Sept. So v02 should be ready for WG last call. Rahul - does draft talk about MP2MP case on LANs? Jean-Louis - only cover P2MP on LANs saying that should avoid ingress replication. Req applies equally to MP2MP. Yakov - on packet duplication/loss doc needs to be *very* clear that you can avoid either but can't avoid both. ?? - in which situation would you allow duplication? Jean-Louis - some apps may not be sensitive to duplication but may be to loss. ?? - examples? Jean-Louis - can't think of any mLDP draft - Bob Thomas ----------------------- Draft is by Ina/Ice. Update to status of the LDP P2MP draft. Ice was going to present and had prepared slides but thought meet was Monday and couldn't be here. Not much time so will go through topics quickly 1) root node redundancy. Problem in MP2MP LSPs as only one root address - suggests a SPOF. Draft has mechanism for providing redundancy if root node fails - by assigning multiple roots. Each node is configured for a set of roots, accepts traffic from all but only sends to one (local policy). so effectively you get multiple MP2MP LSPs (one per root). If a root node fails then that nodes stop sending to that LSP and use another one. George - how do we discover failure? Bob - IGP. So leaf can switch at IGP reconvergence speed. Also allows load-balancing over multiple roots. 2) MLDP Make-before-break. Intent is to minimize packet loss/duplication during switch-over from one tree to another. 2 situations to apply this - link failure (so need to re-signal), or new link etc. So single mechanism for both of these. Acknowledgment mechanism. Message is sent by upstream LSR to a downstream that has been sent a label mapping. Ack is sent when LSR is attached to the LSP (i.e. presence on LSP has been signaled to root and acknowledged by upstream). Rahul - have issue on make-before-break. You're adding complexity to mLDP to solve a problem which you don't really solve. If you want to solve it then use RSVP-TE (which has its own tradeoffs - granted). Bob - but in case of good news we do have make-before-break here. switch-over to new LSP doesn't occur until signaling has been completed and acknowledged. Rahul - in that case I'd be impressed if you can prevent both duplication and loss Bob - I believe you don't have either in the good news case. Rahul - don't believe that's possible. Jean-Louis - even with notification you may have transient duplication. Bob - needs atomic forwarding plane update. Rahul - I don't think that's enough. Summary - these procedures should be optional, don't require data-driven events to trigger switch-over from old LSP to new LSP. But need to convince WG. 3) Upstream LSR selection in LAN case. would like to avoid sending duplicates to downstream routers. Would like downstream routes to all pick the same upstream for an LSP. Mechanism is to number upstreams from lower to higher IP address and then to use a formula to pick one (sum opaque bytes in the FEC and modulo it with # of upstreams). 4) wildcard FEC. intention is to use this for label request, withdraw and release (to make peer re-advertise/withdraw/release all mappings of FEC type), 5) generic LSP identifier Yakov - why not use wildcard FEC in LDP at large. Bob - 3036 has wildcard FEC but two flaws: 1) not the notion of FEC types (so withdraws everything) 2) use is restricted (can't be used for label requests). Yakov - so you're saying that 3036 has defective wildcard FEC. So let's correct that! Bob - tried to fix it in 3036bis. But pointed out that to do that wouldn't be following IETF procedure. Yakov - so write a new doc that is applicable to LDP at large (not just mLDP) and can be used for any FEC type. Bob - good suggestion Load-Balancing between candidate upstream LSRs - Shuying -------------------------------------------------------- When there are multiple candidate upstream LSRs the load may be unequally balanced between them. Also may get traffic disruption in some cases (adding LSRs or better path emerging). The mechanism here does per-flow balancing. Downstream selects candidate based on ECMP algorithm using node address and opaque value in FEC as inputs. Aim of make before break to make traffic continuous. so send upstream label request to new upstream LSR. Don't install new label in LFIB when receive label mapping, but wait until an unknown multicast packet is received. MPLS Multicast Verification - George ------------------------------------ We talked to various customers. Showing loosely disguised slide from one of them. Issue is if economic impact of even a brief outage may be serious (either direct liability or because other competing technologies may steal customers). Example applications are stock tickers or TV feeds. Dino - quantify those outages. George - 50ms. When you drill down the requirement might not be that stringent but need to aim higher than you're gonna hit. Bullseye is zero :) The fast recovery time drives the need for heavy-duty OAM. May get to the point where if there's a problem they'll shut down feeds to everyone (rather than losing feed to one broker who will then sue them). Also issue of very high numbers of end points. So need some kind of auto configuration. Solution is a two-phase approach (config and then heartbeat both from head end). Heartbeat (phase 2) is very simple (and will be sent at high frequency). Not yet defined if will be a new packet type or not. LSP echo request. Carries FEC stack and discriminator. First thing you do if you receive it is to verify that you're actually the exit point. Define minimum interval for transmit. Also multipliers for down indication and a period to jitter replies. Refresh interval for control message (soft-state model). Also flags for admin down and FDI. Might change to a code-point rather than flags. Current thought is that tail ends should simply be configured to accept OAM for any combination of mLDP and P2MP RSVP-TE (so no need to explicitly configure sessions). Might be time to add security authorization to LSP ping, but that would be a separate draft. Receivers start listening for these messages immediately. Head end will start sending probes before sending a control message (to avoid race conditions). Thought of other solutions such as longer initial interval but these seemed a bit useless... This could apply to other applications, but has been optimized for the above. So root probes leaves (respecting interval configured). Leaves don't send replies - seems like a bad idea to have lots of messages saying "I'm still alive". Issues: Can we do tail repair or do we need message sent back to head end? RDI notification in message? Also discussion on timestamps/seq numbers to do jitter/packet loss measurements. If can be done without the hardware having to do to much work then can do. Or can have two packet types. Would like more input from people on whether this kind of stuff is "nice to have" or essential. In the RSVP case you know all the leaves in your tree. if you haven't heard from some then might be worth sending LSP ping with list of addresses saying "please only reply if you're on this list". May also be useful with mLDP. Will be posting draft right after the meeting and hopefully bringing something pretty mature to the next meeting. Have also agreed to work with Adrian to see which parts of this should be in his draft. Ben Niven-Jenkins - we (BT) have requirement for this. One comment though - why use LSP ping to bootstrap it, why not use the signaling protocol? George - tension is that we have two signaling protocols but would like one mechanism. In P2P case we did one mechanism (using LSP ping to bootstrap), so seemed sensible to do the same for P2MP. Don't want to create more mechanisms than needed. Ben - in draft as currently stands not much discussion of how to deactivate the OAM. Need procedures to deactive OAM before tearing LSP down. George - right now the RDI is only control. Support for ECN and PCN in MPLS networks - Bruce ------------------------------------------------ ECN has been a standard for a while (RFC3168). uses 2 bits to encode 3 states (so convey "can end system support ECN" and "did this packet experience congestion"). Prior to ECN TCP could only use drops to respond to congestion. Actually ECN can be used with protocols other than TCP. Presented draft on this 6-7 years back. Re-presenting as ECN has become more popular recently. Issue is that only the 3 EXP bits are suitable for this, and they are already widely used for DiffServ. Even stealing one bit could be tricky. 1999 proposal (Davie et al) suggested stealing one bit (was a WG draft but issues around lack of solutions to hard problems plus lack of interest in ECN.) Can overload to encode 3 states in one bit (only issue is that as you flip the bit you could end up flipping twice if hit congestion twice). 2000 proposal (Shaman??) only carries CE bit. ECN is in the IP header so no need to carry in MPLS header. RFC3270 defines usage of EXP field for DiffServ. So now with 6 years experience we have better understanding of both ECN and DiffServ. Proposal in current draft is to use a codepoint instead of a bit since may not need ECN on all traffic (e.g. use on data but not on voice). So potentially could deploy 6 classes and still have ECN on two of them. so most parsimonious use of EXP. Also using Shaman draft concept of handling ECT at egress. Also aim is to be permissive - i.e. allow other uses of EXP. Concrete example of adding ECN to AF11 class by adding an additional EXP codepoint for "AF11 and CE". one reason to be interested in this is PCN support. PCN like ECN but needs 3 code-points rather than 2. To summaries: There is much more interest in ECN/PCN now than 6 years ago. This is a very efficient in use of codepoints consistent with earlier work, with RFC3168 and with RFC3270. The TSV working group would prefer this to be done in the MPLS WG. The reason Bruce started in TSV is that you need ECN expertise to understand it. If MPLS WG is happy to take it then will do it here, and get review from ECN experts elsewhere. ?? - what about colors in AF11 case? Bruce - was assuming you weren't looking at drop precedence here. ?? - what about L-LSPs, you can have 2 bits there? Bruce - focused on E-LSPs as more widely deployed and also more difficult. Dino - were you thinking of using remarking logic that already exists at ingress? Bruce - processing on ingress is similar to DiffServ but egress is more complex. P2MP root discovery - Yang Yang ------------------------------- Both RSVP and LDP drafts assume that leaf knows where the root is. But how does leaf find the root? Could use routing protocols (drafts on this), or could use directory services. The assumption is that each node knows where the directory server is (configure or find with routing protocols). Examples with LDP and RSVP. Difference is that in RSVP case the server needs to tell the root about the leaf, whilst in the LDP case the leaf just does a join. The benefit is dynamic discovery without use of routing protocols in a manner that can be applied to LDP and RSVP-TE and which scales to large networks with lots of changes. We need to look at security issues (given use of DS). We also need to look at inter-domain issues (how to synchronize DSes). First version of draft doesn't really detail the procedures. do we need 2 drafts - one for reqs and then one with detailed procedures. Satoru (Japan Telecom) - requirements make sense. One question about requirements in routing protocol. Yang Yang - you may need to extend IGP if you want dynamic means to find root. In inter-domain case may need BGP. Satoru - first step is to focus on routing protocol extensions. Yakov - I have no doubt that you can use DS for this, but why? P2MP LSPs don't exist for their own sakes but are used for applications (e.g. VPLS and 2547). The applications do discovery as a byproduct of the application itself. The DS adds complexity to the system. Don't know of any applications where the directory would be a benefit. George - let's take this to the list. Explain the problem set and how this would enhance the system. I had basically the same question as Yakov - putting a meta-layer in to get the info doesn't seem to be a solution. Need to understand how you're going to use it. Routing extensions for discovery of P2MP leaves - Jean-Louis ------------------------------------------------------------ There are obvious issues with static configuration at ingress (configuration overhead plus inability to do dynamic add/removal of leaves). This draft allows discovery of leaf LSRs, but not of multicast group membership (this is not IP multicast). Relies on IGP extensions (ISIS and OSPF node capabilitities) - allowing LSR to advertise its desire to join/leave tunnel. Very useful in BGP free and IP multicast free core routers. so new TLV - which carries the set of groups the LSR belongs to. ISIS capability TLV or OSPF router information LSA. A node needs to add/delete an entry in the TLV to join/leave group. This is only useful for core router where join/leave frequency is low. Key is that joining/leaving a tunnel isn't synonymous with joining/leaving a group. Plan is to review encoding of group ID. Currently 32 bit integer. Might become variable length opaque group ID. This is a first cut so need more work (and more feedback) before requesting WG adoption. George - would like to see requirements at a higher level (same as previous speaker). So where does this fit into the system as a whole? Don't put stuff in the IGP just because it's convenient to do so. Need a real strong requirement for this. Adrian - at a high level it looks good. But we need work on policy/security. Current draft says "no security issues" but joining someone else's tree is a security issue! Toerless - something like tail-end join in mLDP or PIM. Problem is doing this by flooding. Why tell the whole world you want to join a tree when only the head-end cares about it? Also isn't one of the targets for this RSVP P2MP tunnels between P nodes, but not between PE nodes? So PE does mLDP and then P node does this procedure at that point? Unicast join might be better at that point. Jean-Louis - there isn't 1:1 mapping of P2MP tunnels and P2MP groups. George - are you thinking of automesh? Jean-Louis - this is built on automesh. Toerless - but still flooding when you have limited number of receivers. Lou?? - have you thought about applicability of Leaf Initiated Join to avoid flooding? Jean-Louis - if you do a join then which protocol will you use? We assume you're in the core. Lou?? - I think you should consider LIJ JP - just a question for George. You want to see more discussion of requirements. Requirement is well-scoped (i.e. automesh) so what do you want? George - want to see consensus on list that people want this... Extensions to RSVP-TE FRR - Feng Jun ------------------------------------ issue is that when tunnel breaks the refresh messages also switch to the backup path. aim is to know which protected tunnels use which backup tunnels. PLR knows already, but MP doesn't. So tell this info to the MP using new messages (MP_NOTIFY and MP_ACK). Then only need to send refreshes on the backup tunnel. JP - simple Q. What advantage does this scheme have? what problem are you trying to solve? George - no time for questions; ask on the list. LDP/IGP Sync - Luyann Fang -------------------------- This is an expired draft, Luyann wants to reissue. Targeted as an informational RFC as it proposes no protocol changes. Same class as RFC3137 (IGP/BGP sync). Would like to ask for show of hands if can be a WG doc. George - needs to ask on the list, but might be worth getting sense of the room now... Sense of room seemed favorable - will be decided on list. The meeting was adjourned. Detailed discussion on upstream labels ------------------------------------- Yakov - on P2P media you can't have upstream labels anyway so don't worry about that. Eric - good point. Yakov - in order for any collision detection to work wouldn't each edge node need to keep info about all other edge nodes? One advantage of LDP over RSVP is that leaf doesn't need to know about all the other nodes. So doesn't that defeat the whole purpose of using LDP? Eric - but any use of upstream assigned labels needs a lookup table that is specific to the upstream node. Toerless - I'm worried about performance requirements in routers. We may be simplifying the control plane at the cost of forwarding plane performance. Eric - well yes, this mechanism does add an additional label. Toerless - can't we look at a solution that has single label (though that complicates the control plane by dividing the label space). We did that in 2000 so can be done. Eric - we could have a protocol to dynamically divide the label space. But there are still problems identifying the space. Any such mechanism seems complex/fragile. Yakov - don't have to have the same mechanism for Ethernet and MP2MP. So could use 20 bits on Ethernet and on MP2MP take advantage of the fact you have a root node and then get the root node to assign labels to the leaf nodes. Eric - but that has a lot more control protocol (which would make this as bad as RSVP). Yakov - but the same is true of collision detection. Eric - if the root node talks continuously to the leaves using a new protocol for assigning labels then that's bad. Jean-Louis - static could be ok. Operator just takes 15 minutes to assign a label. Eric - if SPs have consensus that they're OK assigning 20 bit values then that's great. Jean-Louis - it's just like a loopback Eric - if SPs don't care then I don't care. Toerless - using locally configured loopback as the context is just like that. Eric - but 20 bit values are different from IP addresses. Toerless - what is the most simple protocol that Eric wouldn't consider fragile and which doesn't depend on the root node? Issue here is that the Root node may be a P node. Eric - yep as Toerless said makes it hard to have a redundant root node. Kireeti - if providers such as Jean-Louis say static is OK then that's OK. Thomas Morin - what about using a perfect hash? So each node calculates labels assigned to other nodes. Works as soon as each node knows that another node needs a label. Doesn't need a protocol but is complex. Eric - but perfect hashes are like static config but done dynamically. Problem is that when a new address comes along it breaks the whole thing. Larry - I'd advocate static solution as some implementations may want to compress the 20 bit space to something far smaller. Eric - thing is that static context labels won't reduce total number of labels. Thomas (different Thomas) - don't like idea of static labels as don't want to manage it. Jean-Louis - this is same problem as loopback addresses. Yakov - we need to think through implications of static assignment. If you have static assignment and you configure the same label for two upstream PEs then you can end up in a steady state where traffic gets misdelivered between two VPNs. Eric - Thomas, are you an SP? Thomas - yes. George - feel fairly confident that this discussion will continue on the list.