TRILL Working Group Agenda, IETF 82 =================================== Grand Hyatt Hotel, Taipei, Taiwan Chairs: Erik Nordmark (Cisco), Donald E. Eastlake 3rd (Huawei) ============================================= SESSION 1 of 2, TUESDAY 1300-1500, Room 102 ============================================= Blue sheets circulated. Erik Nordmark (co-chair): Welcome. Jon Hudson (Brocade) volunteered to take notes. Is there another note taker and a Jabber scribe...? Donald Eastlake (co-chair): [See slides: "Agenda, etc."] Drew attention to the Note Well. We have two sessions, 2 hours today and 1 hour tomorrow, which are pretty full. Agenda divided by topic: TRILL Edge Link Technologies TRILL Protocol OAM [Room lights turned down so slides more easily seen.] Donald: Agenda bashes? Dinesh Dutt (Cisco): Due to a conflict between L3VPN and tomorrow's session I would like some OAM and fine-grained labeling presentations moved to today. Donald: We also have a request from one speaker to not move their presentation to tomorrow's session. Perhaps we can just swap the OAM and Edge sections? And move fine-grained labeling to start of Protocol section. Any objection? Plan had been to strictly enforce time early on to give any spare time to OAM...may need to relax OAM time limits if we move it early. Erik: TRILL Edge section would just fill tomorrow. But if Linda Dunbar needs to talk today, we need to move something else to tomorrow. [General Agreement] Donald: [Continuing with Administrivia slides] Charter milestones. Document status. Draft dependencies. Open source implementations. Code point allocations. Any comments or correction? [none] ICMP based OAM Solution for TRILL, Tissa Senevirathne (Cisco) ============================================================= draft-tissa-trill-oam-00.txt Tissa Senevirathne: [See slides: "ICMP Based OAM"] Holistic approach. Draft authors are myself, Dinesh Dutt (Cisco), and Vishwas Manral (HP). Re-use existing ICMP tools including from RFC 4379 but customize for TRILL. Customers want a unified tool. Use IS-IS GENAPP to distribute OAM information. Built around request, response, and notification messages. OAM packets must be able to follow exactly the same path as the customer data. Current status of proposal. Comments received so far: this fills existing gaps; ICMP echo response not currently extensible according to the RFCs -- but we have developed a way to do that; how to limit OAM reporting scope -- use capabilities in ISIS multi-instance draft; how to implement the Alert message -- use TRILL alert. Linda Dunbar (Huawei): Relationship between request message and ping message? Where does diagnostic payload come from? Tissa: Since you want the request to follow the same path you put a payload in place. It comes from a command line or from monitoring traffic. Thomas Narten (IBM): You call it a "diagnostic payload" but it is really just the first part of the data whose path you want to follow... Linda: As a network manager, how would I get this data? Do you have a database of all user data? Tissa: No. If someone calls with a complaint about communicating from A to B, you can get the IP addresses. There is a table in the draft of defaults to use if you don't have specific data. If the IP addresses are not specific enough eventually you can drill down to the MAC addresses or whatever... Dinesh: With IP, we have Ping and Traceroute but often they do not follow the same path as failing data. You can do that but if it is not enough, you can make the probes more specific. Maybe most traffic from A to B works but HTTP traffic fails. You can drill down as much as you want. David Allen (Ericsson): I understand impersonating data but how does the RBridge I want to query know to terminate the OAM packet? Tissa: There is a whole section of the draft on that and a slide I didn't get to. It depends. There is an EtherType we can use for Layer 2. Well known source address or DMAC. Or we can use hop count = zero technique. Kireeti Kompella (Juniper): I'm co-author of RFC 4379. There is one thing you didn't borrow from 4379: We didn't use ICMP. And we have downstream mapping...which helps to explore all paths. Tissa: We could use downstream mapping as an extension to help explore everything. But not too helpful to debug a specific user complaint. Jon Hudson (Brocade): Customers are very concerned about congestion. Expect customers to run this tool all the time to get real time results. I'm very excited. This is Good Stuff. Yizhou Li (Huawei): Implementation difficulties -- once we strip TRILL header we have an Ethernet frame. So more of the protocol stack is involved. ICMP is not normally a very sophisticated protocol stack. Dinesh: That is an implementation question. Tissa: Generally I think you just have to swap the ingress and egress nicknames to send a response. Yizhou: If you have an MPLS error, when you strip the MPLS header you have an IP header which fits with ICMP and swapping the source and destination IP addresses. But here, you may have an arbitrary Ethernet frame after stripping TRILL Header... Dinesh: Let's take this question off line. Thomas Narten: Thanks for this draft. We need to look into this. OAM Tool for Multi-Destination OAM, Yizhou Li (Huawei) ====================================================== draft-yizhou-trill-multi-destination-ping-01 Yizhou Li: [See slides: "Multi-Destination OAM"] Multi-destination RBridge OAM. We already have the RBridge Channel draft and David Bond's unicast ping and traceroute. This draft talks about multi-destination, distribution trees, etc. Want to know the leaves (egresses) for particular VLAN/tree and find what links/RBridges are failing in multi-cast distribution. Uses a new RBridge Channel protocol. Similar to MPLS LSP Ping. Can do non-reply or with reply. Can include additional information in TLVs: target RBridge, jitter. Examples. What does group think about this sort of multi-destination path testing? Should we use a different Channel Protocol or the same as unicast but with a different SPID? Want to be sure frame is subject to same pruning as real data frame. Difficult due to fixed DMAC in RBridge Channel draft. But we want the real DMAC and maybe a special sourc MAC or perhaps a TLV... Tissa: Channel draft says DA must be all-egress MAC address. Deviation for Channel draft is needed, right? Yizhou: Yes, for pruning some change is needed. Tissa: Don't know if you have seen the latest 6326bis but there is an extension for IP multicast S,G mapping. How can you address that with this proposal? Yizhou: Yes, but pruning is optional. Tissa: Yes, pruning is optional, I guess, but how do you detect if it is in effect? Yizhou: We do have problems with that. Tissa: Existing Layer 2 switches do S,G pruning based on IGMP snooping so I think TRILL should be able to also. Yizhou: Pruning is silicon dependent. Pruning on VLAN+IP, it is fine grained pruning but you get a lot of pruning tabl entries. Tissa: I think we need to design for future extensibility including IP-based pruning. TRILL over MPLS PSN, Lucy Yong (Huawei), Donald Eastlake 3rd (Huawei) ===================================================================== draft-yong-trill-trill-o-mpls-00 Lucy Yong: [See slides: "TRILL over MPLS"] This is a new draft on TRILL over MPLS. It describes use cases where MPLS can connect RBridge sites to form one campus. Two approaches, point-to-point or multi-access. Also defines a new hierarchical approach using MPLS and TRILL. Draft is informational because it does not require any changes in the MPLS or TRILL standards. [Skip slides explaining TRILL as these were only included for presentation to other WGs.] Describe point-to-point case. Multi-access, which is supported by TRILL, using VPLS. Hierarchy with TRILL access and MPLS/IP core -- very scalable. We would like review and feedback. Hoping to add auto-configuration for hierarchical use. Donald: We are running a bit behind and just have time for one question. Charles (Cisco): Slide 6 Ethernet interface? How common is this configuration? Lucy: We are just showing various examples to illustrate the flexibility. TRILL over IP, Margaret Wasserman (Painless Security), Zhang Dacheng (Huawei) ============================================================== draft-mrw-trill-over-ip-00 Margaret Wasserman: [See slides: "TRILL over IP"] TRILL over IP draft. TRILL currently defines how it can run over Ethernet and PPP. This draft explains how to run TRILL over UDP/IP, either IPv4 or IPv6. The draft just defines an encapsulation and does not change the TRILL protocol. Two scenarios are described. TRILL runs over multi-hop IP networks that may or may not support multi-cast. In some scenarios, security over the IP link is critical, in others IP link is under the same administrative control as the RBridges and security is less important. Encapsulation uses UDP port number to distinguish TRILL Data and TRILL IS-IS. DTLS (datagram TLS) used for security but DTLS does not support multicast. So in the secured multi-access case, serial unicast is necessary. Puneet Agarwal (Broadcom): TRILL packet vs. TRILL payload in slides? Margaret: It is a whole TRILL packet (IS-IS or Data) that is encapsulated in UDP. Puneet: So, this is TRILL over IP the same way the current TRILL specification is TRILL over Ethernet? Margaret: Exactly. Sue Hares (Huawei): This looks very similar to what was done in LWAP (CAPWAP). Can you comment on similarities and differences and possible difficulties that CAPWAP revealed? Margaret: This is very similar in some ways because LWAP/CAPWAP also had a data channel and control channel. But this is somewhat different. The problem we had there was that DTLS wasn't ready. But due to CAPWAP, DTLS is now more mature and better deployed. Sue: Part of problems were with interrupted or aborted sessions, although mostly with the control plane. I think the data plane problems were all implementation. Margaret: We had a lot of early problems in CAPWAP because none of the several DTLS stacks would interoperate, because they were all wrong. But we got that fixed and they do work now. Tissa: Have you seen OTV? It is similar. Margaret: No. Tissa: I'll send you the links. On multicasting, are you restricting this to private networks based on the IPv6 multicast prefix you are suggesting? Donald: No. The multicast addresses given are defaults at best. But we are out of time and need to take this question to the list. Erik: How many people have read this? Few. Suggest people read it and then we can discuss on the list whether we want to adopt it. Clarifications and Corrections & RFC6326bis, Donald Eastlake (Huawei) ===================================================================== draft-eastlake-trill-rbridge-clear-correct-01 draft-eastlake-isis-rfc6326bis-01 Donald Eastlake: [See slides: "Clarifications and Corrections"] Fixes up three Errata to RFC 6325 that were posted recently but also updates RFC 6327. Includes clarifications based on questions asked on the mailing list. Covers considerations for overloaded and unreachable RBridges. Distribution tree roots determination. Don't use unreachable RBridges as tree roots. Specifies an optional feature so that RBridges in overload can originate multi-destination frames. This feature is intended for bootstrap kind of things, like DHCP, so it is OK to implement in the slow path. Has some fixes related to nicknames. Has a section on exactly what the various MTU numbers mean -- that section needs more work. Updates RFC 6325 for the backwards incompatible change made by IEEE 802.1 changing the CFI bit in a C-VLAN tag to a DEI bit. Updates RFC 6327 to provide that LSP synchronization is done in both the two-way and report states. Erik: How many people have looked at this draft? A handful. People should read this document and discuss on the list. Donald: Due to the shortness of time, I'd like to keep moving along. People are welcome to ask questions on th list. Donald: [See the slides: "RFC6326bis"] Intended to obsolete the current RFC 6326 "TRILL use of IS-IS" draft. Changes a couple of TLVs but in a backward compatible fashion. Adds some sub-TLVs for the Group Address TLV including group sub-TLVs for IP address groups and adds versions with the VLAN field replaced by a 24-bit field. Adds another way to report what VLANs you are AF for which could be used in some ways to reduce the number of Hellos. Adds versions of other sub-TLVs with a VLAN field replaced by a 24-bit field. These 24-bit fields in sub-TLVs are independent of how the 24-bits is encoded in the data plane. Adds facilities for reporting version and capabilities including RBridge Channels. Allows the SNPA field ("MAC address") in the TRILL Neighbor TLV to be a length other than 6 bytes. Requires LSP number zero be no larger than 1470 bytes and that originatingLSPBufferSize TLV be in LSP number zero. Erik: We need to speed up. I recommend that people read this draft. It is for the ISIS WG but need to be sure it matches what we are doing. Fine Grained Labeling, Donald Eastlake (Huawei) =============================================== draft-eastlake-trill-fine-labeling-02.txt Donald Eastlake: [See the slides: "Fine Grained Labeling"] A way to label TRILL Data with more than a 12-bit VLAN label. There is no change in the outer labeling of frames on an Ethernet link. There is still just a VLAN tag (or none if suppressed by the output port) on the outside. There is no effect on the Appointed Forwarder or Designated VLAN logic. This was presented at the last IETF meeting and there were an overwhelming number of people in the room who wanted it adopted as WG draft but there were complaints about it using two C-VLAN tags in a row inside the TRILL payload. In the meantime, the IEEE Registration Authority has given TRILL a new EtherType, 0x893B, for use in this area. The idea is to have an inner 24-bit label with the indication of whether you are using VLAN labeled data or fine-grain labeled data being a flag bit in the header. Obvious possibilities are C-tag followed by EX-TAG (the new EtherType) or EX-TAG followed by C-tag. See slides that compare these two orderings from the point of view of backwards compatibility and other factors. I would like to update the draft, which currently says TBD in some places. Erik: How many people have read this draft or an earlier version? Not a large number. How many people want to make this a WG document? Six. How many are against making it a WG document? Maybe one. We definitely need to check this on the list. I recommend more people read this draft. We also need to talk about these options, but on the mailing list. Thomas Narten: This is an important document. The WG needs to review it and decide quickly. There is a lot of work going on with multi-tenancy. This is not a minor change. Let's not just plod along. Donald: I agree that we should expedite a decision but it needs to be done on the list. Thomas: OK, but the WG needs to decide soon, not just revisit this in three months. Directory Assisted TRILL Edge/Encapsulation, Linda Dunbar (Huawei) ================================================================== draft-dunbar-trill-directory-assisted-edge-03 draft-dunbar-trill-directory-assisted-encap-01 Linda Dunbar: [See the slides: "Directory Assisted Edge"] I see most people are looking at their computers. I hope they are looking at the slides, not email :-) As the last presentation, I'll make this presentation quick. This draft was presented in Quebec City at the last IETF and I got lots of comments. Particularly that there are two concepts: using a Directory and using a non-RBridge node to do the encapsulation. Several people asked for these to be separated. So I did that. The second draft (assisted-encap) is heavily dependent on the first draft. TRILL in the highly dynamic data center environment. Servers need to move, get reloaded with new applications that may be in a different sub-net. Datacenter is different because there is an orchestration system that knows where all the servers are so the MAC address to RBridge mapping is known. Using this you can avoid flooding. RBridges should not receive unknown DMACs and if it does it can just drop the frame. So you no longer need the Appointed Forwarder mechanism. ... This makes it possible to use multiple ports without worrying about AF or flooding. Directory assistance avoids need for ARP/ND including gratuitous ARP. There are two models, push and pull. Push is good for more static models but you may have more entries than you need because the mechanism does not know the actual traffic flows. With the pull model, you get the directory entries only when you need it. Behavior is similar to routers today. A layer 3 router holds a packet, if it doesn't know the MAC address, and ARPs. If it gets no response after a few tries it drops the packet. So with pull model you have fewer cache entries. Linda Dunbar: Non-RBridge node doing encapsulation: Any node that knows the egress can do encapsulation. Can set ingress to the edge RBridge or can create a phantom node/nickname. Erik: How many people have read this draft or earlier version? Do people have opinions on whether this should be a WG document? Dinesh Dutt: I am interested in this. The problem is important. I might disagree with solution specifics in drafts, adding information to ARPs but that is a detail. Just having a directory-assisted edge is fine. Thomas Narten: Let me be a troublemaker. Let's not just ask if this should be a WG draft but let's step back and say what are the things that should be high priority for the WG in the short term. And what stuff should just be in the background and given less priority. The WG needs to focus. We need to get a core completed and out in the field. Tissa (?): I second Thomas. A very, very important point. Donald: My opinion is that the highest priorities are OAM and fine-grained labeling. Any objections to that? [none] Dinesh: I support that. But going back to Thomas, we must prioritize. Things should move faster. Erik: I think the way to get things to move faster could be centralized or distributed. If centralized, we decide what to concentrate on. If distributed, each person decides what they think is important and want to spend cycles on. Ralph Droms (Internet AD): I think you should follow both strategies. The WG should decide very quickly, like this week, on the top 3 or 4 documents and concentrate on them. Beyond that, it could be up to what people want to work on. Donald: OK. We are beyond our time. Please sign the blue sheets if you haven't yet. We have a session tomorrow, see you there. Can someone turn the room lights back up? ================================================ SESSION 2 of 2, WEDNESDAY 1510-1610, Room 101B ================================================ [ The sound quality of the recording was markedly inferior for Session 2, compared with Session 1, so these minutes may be less accurate. ] Donald: The text agenda has been updated to show what is actually left after the session yesterday. Also, the times allowed have been reduced to make the talks fit, assuming they stick to the new times. See the Note Well slide. TRILL IS-IS MTU Negotiation, Mingui Zhang (Huawei) ================================================== draft-zhang-trill-mtu-negotiation-01 Mingui Zhang: [See the slides: "MTU Negotiation"] TRILL MTU negotiation problem detected by a vendor. TRILL determines Sz based on whole campus so link MTU test depend on campus wide determination which can change when RBridges enter/leave the campus and can cause problems. [Examples in slides.] We solve the global dependency by introducing a new link MTU value, Lz, and an MTU testing algorithm based on binary search. The Clarifications and Corrections draft already clarifies that LSP synchronization starts in the two-way state. But also CSNP and PSNP should not be limited by Sz but by the new Lz. Traffic MTU is different and constrained by the port MTU. Donald: How many people have read the draft? 4 or 5. I suggest that people read the draft. It has good examples in it and we should consider how it would fit into TRILL documentation. [No one came to a microphone to comment.] TRILL Header Extension, Donald Eastlake (Huawei) ================================================ draft-ietf-trill-rbridge-extension-00 Donald Eastlake: [See the slides: "Header Extension"] The TRILL Header has an extension feature, with size indicated by a length field in the Header. At the last meeting an extensions draft was presented but the reaction at that meeting, and confirmed on the mailing list, was that the draft was too complex and speculative -- that it should be trimmed down to what is actually motivated by shorter-term requirement. So I've trimmed down the draft, actually split the -options-05 draft into this simplified draft (-extension-00) and a -options-06 draft that will be posted later this week. It specifies one specific flag in the extension, an Alert flag for use in connection with RBridge Channel. So, I've tried to give people what they asked for and I think this draft could be WG last called soon. [No one came to a microphone to comment.] Erik: How many people have read this draft? Not enough. Either more people have to read it or we could force more people to read it by doing a WG last call. This draft isn't very long. Donald: If you read the previous draft, this one is shorter... Pseudonode Nickname, Zhai Hongjun (ZTE), Radia Perlman (Intel) ============================================================== draft-hu-trill-pseudonode-nickname-00(01) Radia Perlman: [See the slides: "Pseudonode Nickname"] This presentation is to explain the issues and subtleties. I don't think we have a full solution yet. If you change the Appointed Forwarder, perhaps because an RBridge dies, the cached MAC address to egress mapping at a remote RBridge is wrong and data can be black holed until that stale information times out. One suggestion is to use a pseudo-node to represent the link and put that in as the ingress so it will be learned. Then return traffic could go to any of the RBridges on the link. But there are a bunch of subtleties. There is a concept of an access link but it gets a bit odd if you use a pseudo-node. If a TRILL Data frame returns to other than the Appointed Forwarder, the RBridge would typically have to send it to the AF for decapsulation. Then there are links where the RBridge can't actually talk to each other because the device they are directly connected to is a bridge doing link aggregation across the links to the RBridges. Then how do the RBridges co-ordinate? You also want to re-use the pseudo-node nickname if th DRB changes but if a new RBridge comes up with higher priority, it might not know what that nickname was so maybe there should be some way for the other RBridges on the link to tell it. Then there is the question of what to do if the link partitions. ... Then there is the Reverse Path Forwarding Check -- you have to make it look to the rest of the campus as if a frame with the pseudo-node nickname as ingress was received on the tree used in the frame which may require the AF to forward the frame to some other RBridge on the link to send on the tree. Another issue is that, commonly, the amount of multi-destination traffic on the link is doubled. One solution is to avoid being Appointed Forwarder if there isn't a tree you can conveniently forward on. It's all sort of hairy. Erik: Any comments? [No one came to a microphone to comment.] Radia: All the people with strong opinions have probably gone to the L3VPN meeting resulting in a mellow audience :-) RBridge Aggregation, Mingui Zhang (Huawei) ========================================== draft-zhang-trill-aggrgation-01 Mingui Zhang: [See the slides: "RBridge Aggregation"] Link aggregation is widely used in Layer 2 to overcome bandwidth limitation. TRILL limits handling of each VLAN at an edge link to a single Appointed Forwarder RBridge. With RBridge aggregation multiple RBridges can use the pseudo-node nickname for ingress. This way multiple RBridges can forward frames for the same VLAN on the same link. For multicast decapsulation, you need to avoid multiple copies being sent on the link. Then there is the loop problem of an egressed frame being ingressed. If we use link aggregation, these problems are avoided. We can also use a local hashing function to choose which RBridge of a local group handles a frame. ... For example, if two RBridges, one could handle frames with a MAC address ending in a zero and one could handle MAC addresses ending in a one. MAC address learning should be synchronized between the edge group members with ESADI. If an aggregation member fails, the next in the list takes responsibility for the frame. Tissa: I didn't quite understand how does hashing ensure that packets to an RBridge that is supposed to forward on a particular tree? Donald: Main use of hashing is on egress so when a multi-destination frame arrives at all RBridges in the group, only one egresses it to the link. Tissa: The native bridge will use its normal hash so it may give an outgoing multi-destination frame to an RBridge that can't put it on the tree. Donald: Oh, you are talking about the same as Radia's RPF issue ... you may be correct that there is a problem. Any further questions? [No] Multi-homing Connectivity to TRILL Network, Janardhanan Pathangi Narasimhan (Dell) =========================================== Jana: [See the slides: "Multi Homing in TRILL"] - Problem statement is the same as the previous presentation so I will skip the first few slides. We want to create a virtual RBridge so the remote RBridges think that the frame is from the virtual RBridge that represents the link. The handling of unicast is the same as the previous presentation, so I'll skip over those slides. Jana: From multi-destination -- three choices to handle RPF check problem for outbound: - pick tree where on-path - make both RBridges be roots - tunnel if neither of the above is possible Jana: ... [particularly hard to understand the recording] ... Tissa: This is a very important problem. Can we work on it off line? This is an important topology. Erik Nordmark: The failure shown on slide 11 doesn't show up in Radia's way to draw them. Don't understand if this is the same case. Radia shows case with the whole link partitioning. I Don't need an answer now... Radia: It could be my fault that my presentation wasn't clear. I don't understand Jana's presentation yet. If different links have different costs for different trees, That seems very complicated. I have a vague unease about this. As long as you simulate an RBridge at a virtual switch and it looks like the frames comes from it, I understand. Jana: ... [hard to understand the recording] ... Donald: We are out of time for this session. Erik: No, we have seven more minutes. Donald: OK! So, we have time for the final presentation. Erik: It is a good suggestion that folks work on the issues in these previous three presentations off-line, perhaps after more sleep. VLAN Assignment, Mingui Zhang (Huawei) ====================================== draft-zhang-trill-vlan-assignment-02 Mingui Zhang: [See the slides: "VLAN Load Balancing"] This is an update on the VLAN assignment draft that is a solution for VLAN load balancing. The DRB could appoint forwarders so that the load is imbalanced. This is caused by no feedback to the DRB. So we define two sub-TLVs to feedback the bandwidth used and number of attached MACs. Helps to pinpoint hot spots. Any comments? Erik: Are you assuming pseudo-node nickname? I am concerned about the effect of assigning to different AFs based on load. Can't do it too often since re-learning and packet loss possible. Radia: Yes, you don't want to change AF too often unless you have pseudo-node nickname. These things are all interrelated. A holistic solution would be great. Maybe we need to set up a smaller mailing list. Mingui: OK. I will consider that. Thanks for the advice. Erik: Thanks to all the speakers for keeping us on the tight agenda. Anything else? Donald: At the last meeting a presentation was made on one way to do FCoE. This was an informational document since it made no changes to the TRILL (or FCoe) protocols. There was an objection that it wasn't worth the WG's time to process it. It has now been submitted as an independent RFC submission. If anyone feels that its publication would interfere with the TRILL WG, they should speak up. We will also ask on the mailing list. Erik: OK, that's it. I have one blue sheet here. Please bring up the other one. [Blue sheets: 60 names listed at the end of the meeting.]