Re: [rbridge] network topology constraints in draft-tissa-trill-cmt-00
Santosh Rajagopalan <sunny.rajagopalan@us.ibm.com> Thu, 05 April 2012 22:48 UTC
Return-Path: <rbridge-bounces@postel.org>
X-Original-To: ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com
Delivered-To: ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com
Received: from localhost (localhost [127.0.0.1]) by ietfa.amsl.com (Postfix) with ESMTP id A358321F86F1 for <ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com>; Thu, 5 Apr 2012 15:48:52 -0700 (PDT)
X-Virus-Scanned: amavisd-new at amsl.com
X-Spam-Flag: NO
X-Spam-Score: -6.598
X-Spam-Level:
X-Spam-Status: No, score=-6.598 tagged_above=-999 required=5 tests=[BAYES_00=-2.599, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_MED=-4]
Received: from mail.ietf.org ([12.22.58.30]) by localhost (ietfa.amsl.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id YNH8W7zAlYyo for <ietfarch-trill-archive-Osh9cae4@ietfa.amsl.com>; Thu, 5 Apr 2012 15:48:51 -0700 (PDT)
Received: from boreas.isi.edu (boreas.isi.edu [128.9.160.161]) by ietfa.amsl.com (Postfix) with ESMTP id 83D7B21F868A for <trill-archive-Osh9cae4@lists.ietf.org>; Thu, 5 Apr 2012 15:48:46 -0700 (PDT)
Received: from boreas.isi.edu (localhost [127.0.0.1]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id q35MTaYf014294; Thu, 5 Apr 2012 15:29:38 -0700 (PDT)
Received: from e5.ny.us.ibm.com (e5.ny.us.ibm.com [32.97.182.145]) by boreas.isi.edu (8.13.8/8.13.8) with ESMTP id q35MSlIb014242 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) for <rbridge@postel.org>; Thu, 5 Apr 2012 15:28:56 -0700 (PDT)
Received: from /spool/local by e5.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for <rbridge@postel.org> from <sunny.rajagopalan@us.ibm.com>; Thu, 5 Apr 2012 18:28:46 -0400
Received: from d01dlp02.pok.ibm.com (9.56.224.85) by e5.ny.us.ibm.com (192.168.1.105) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 5 Apr 2012 18:28:43 -0400
Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by d01dlp02.pok.ibm.com (Postfix) with ESMTP id 9570B6E804C; Thu, 5 Apr 2012 18:28:42 -0400 (EDT)
Received: from d03av06.boulder.ibm.com (d03av06.boulder.ibm.com [9.17.195.245]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q35MSfo4289186; Thu, 5 Apr 2012 18:28:42 -0400
Received: from d03av06.boulder.ibm.com (loopback [127.0.0.1]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q35MSMT4017572; Thu, 5 Apr 2012 16:28:23 -0600
Received: from d03nm127.boulder.ibm.com (d03nm127.boulder.ibm.com [9.17.195.18]) by d03av06.boulder.ibm.com (8.14.4/8.13.1/NCO v10.0 AVin) with ESMTP id q35MSMnR017569; Thu, 5 Apr 2012 16:28:22 -0600
In-Reply-To: <CAF4+nEHRBt02zeC=v=qhCYQJq_fPhMNddWOU_ag3-mVjoAC=qw@mail.gmail.com>
References: <OF71D13FAE.515DCA1B-ON872579CF.0065A890-882579CF.00790240@us.ibm.com> <CAF4+nEHRBt02zeC=v=qhCYQJq_fPhMNddWOU_ag3-mVjoAC=qw@mail.gmail.com>
To: Donald Eastlake <d3e3e3@gmail.com>
MIME-Version: 1.0
X-KeepSent: 5C9DC8DA:85823406-872579D7:0078EB2A; type=4; name=$KeepSent
X-Mailer: Lotus Notes Release 8.5.1FP5 SHF29 November 12, 2010
Message-ID: <OF5C9DC8DA.85823406-ON872579D7.0078EB2A-882579D7.007B69B4@us.ibm.com>
From: Santosh Rajagopalan <sunny.rajagopalan@us.ibm.com>
Date: Thu, 05 Apr 2012 15:27:53 -0700
X-MIMETrack: Serialize by Router on D03NM127/03/M/IBM(Release 8.5.1FP2|March 17, 2010) at 04/05/2012 16:27:55, Serialize complete at 04/05/2012 16:27:55
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 12040522-5930-0000-0000-000006C309FC
X-ISI-4-43-8-MailScanner: Found to be clean
X-MailScanner-From: sunny.rajagopalan@us.ibm.com
Cc: rbridge@postel.org, rbridge-bounces@postel.org
Subject: Re: [rbridge] network topology constraints in draft-tissa-trill-cmt-00
X-BeenThere: rbridge@postel.org
X-Mailman-Version: 2.1.6
Precedence: list
List-Id: "Developing a hybrid router/bridge." <rbridge.postel.org>
List-Unsubscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=unsubscribe>
List-Archive: <http://mailman.postel.org/pipermail/rbridge>
List-Post: <mailto:rbridge@postel.org>
List-Help: <mailto:rbridge-request@postel.org?subject=help>
List-Subscribe: <http://mailman.postel.org/mailman/listinfo/rbridge>, <mailto:rbridge-request@postel.org?subject=subscribe>
Content-Type: multipart/mixed; boundary="===============0222824213=="
Sender: rbridge-bounces@postel.org
Errors-To: rbridge-bounces@postel.org
Hi, Donald 1) The most significant topology concern I have is that with CMT, the CE switches that are connected into the rbridges cannot be interconnected - this forces all east-west traffic to take an extra hop north. This needs to be well-documented. 2) You're right that STP isn't precluded, but with no interconnects possible between the CE bridges, its not needed either. 3) The solution for the CE-rbridge link failure isn't presented in the draft. My bigger problem is that all the solutions I can think of in the framework of this draft are expensive to the point of being deal-breakers. Shutting down *all* the south-facing links on an rbridge when *one* link goes down isn't workable. Can you explain how having multiple RBv nicknames solves this problem? There is one (expensive) solution I can think of, if you have an RBv per CE switch. In this case the rbridge will only advertise a virtual nickname if the link which has been "assigned" to that RBv is still up. This solution, unfortunately, expands the databases and state everywhere, in addition to making our 16-bit space run through faster. Is this the solution the draft authors will be going towards? -- Sunny From: Donald Eastlake <d3e3e3@gmail.com> To: Santosh Rajagopalan/Santa Clara/IBM@IBMUS Cc: rbridge@postel.org Date: 04/05/2012 01:15 PM Subject: Re: [rbridge] network topology constraints in draft-tissa-trill-cmt-00 Sent by: rbridge-bounces@postel.org Hi Sunny, On Wed, Mar 28, 2012 at 6:00 PM, Santosh Rajagopalan <sunny.rajagopalan@us.ibm.com> wrote: > This is a clever draft, but I wanted to point out some network topology > constraints in the proposal: > > 1) It looks like you can only use this proposal if the CE switches are *not* > interconnected in any fashion outside of the trill network. This is because > sending packets into the CE network strips off information needed to prevent > loops. Let me illustrate using the example from the draft: RB1 receives a > multidestination packet from the trill campus on tree 1, and it also has an > "affinity" for that tree. So it decaps it and send the packet into the CE > network (basically, a copy gets sent to each of CE1..CEn). > > Let's assume that the CE network is composed of interconnected switches, > instead of the isolated switches shown in the picture. This is reasonable, > because it avoids needing to take the extra hop to the aggregation layer for > end-systems on the same chassis or rack. This means that a broadcast packet > would be replicated by the CE network to each of its switches, including > CE1...CEn. So CE1..CEn just got their first duplicate. Each of these > switches looks at the attached rbridges as edge ports, so it sends them a > copy. Now, rbridges RB1..RBk each label the ingressing packet with their > respective "affinity" tree labels and sends them into the TRILL network, > where it gets to the edge of the trill network, and the cycle repeats. You > now have a loop. > > In addition, if you had interconnectivity between the CE switches, then the > edge rbridges would be able to exchange LSPs with each other, which will (I think you mean Hellos.) > result in one of them being elected the AF. The others will then not encap > or decap CE packets. So the affinity based approach would conflict with RFC > 6325. All in all, we need the CE switches to be isolated here. OK. The technique in this draft is an optimization to spread load more evenly and improve fail over. If there are cases where you can't use this technique, then, in those cases, you can look for other ways to accomplish either or both such improvements. > 2) Applying an STP-based solution like the one described in RFC 6325 ("The > Spanning Tree Solution") to break the connectivity between the CE switches > won't work here, because this will render certain switches unreachable on > some trees. In figure 13 ("wiring closet topology") in rfc 6325, if RB1 has > an affinity for tree k, then packets coming in from the trill cloud on that > tree will need to get to B2 through RB1, but since STP has blocked B1-B2 > this won't happen. This just reiterates that no form of interconnect > whatsoever between the CE switches is permissible, and "The Spanning Tree > Solution" will not work here. I don't understand your comment #2 above. Assuming use of the technique in RFC 6325 A.3.3 so the link between B1 and B2 is blocked, you imply that multi-destination frames for B2 must somehow be delivered via RB1->B1->B2, which will not work. But why can't they be directly delivered via RB1->B2, since I believe we are assuming that both B1 and B2 are each directly connected to both RB1 and RB2? After all, with the B1-B2 link blocked, there isn't any way that RB1 and RB2 can tell if that link is still up. If someone just snipped the cable between B1 and B2, what would change at RB1 or RB2? Yet you agree, I believe, that if B1 and B2 had never been connected, it would work... > 3) In addition to the above constraint, each CE switch needs to be connected > to every rbridge, and the consequence of any of the LAG links going down are > catastrophic. Also, each rbridge needs to have a vLAG each to each CE switch > in the LAN. This is necessary because the entire CE network has been > "emulated" by the pseudo-rbridge (RBv) in the draft. This depends on whether one RBv is for all the CEs or you have multiple RBv nicknames. It seems to me you at least need a different RBv for each different set of RBridges whose links are aggregated. (Except. of course, if you get down to one rbridge, you can just use that rbridge's nickname for the ingress...) > Let's say a packet arrives from the trill core at a certain rbridge on a > tree that it has an affinity for. The assumption is that by decapsulating > the packet and sending it to each of the attached CE links, all the stations > in the CE network will get the packet. So if there's a certain CE switch > which isn't connected to this rbridge, it will not get the packet (the > packet can't get to the CE switch through another CE switch, because of the > constraint in 1) above). > > This means that a) each rbridge needs to have n vLAGs, one for each CE > switch, and b) each CE switch needs to have k ports in its LAG, one for each > rbridge. Note that most switches have scalability constraints on the number > of LAG members and on the number of VLAGs. For small networks this may not > be a problem, however, this may still be a problem if one of your links on > the LAG goes down. In that case, that CE switch will get permanently > blackholed for some trees. (Essentially, the upstream rbridge on the other > end of the down link no longer has any way of reaching the CE switch on the > x trees it has an affinity for) > > At the very least, this proposal needs a way for an rbridge to "relinquish" > its affinity trees when any VLAG link goes down, and a way those trees to > either be retired or be picked up by other rbridges. In addition, the > rbridge will need to bring all of its CE-facing links down, so that the CE > bridges don't try to use that rbridge to inject packets into the TRILL > network. Lots of things can happen. Links can fail or come up. RBridges can fail or come up. The campus can be re-configured to increase or decrease the number of trees. I don't think any of the proposals specifies how to handle all these events. > 4) Because of the constraint imposed by 1), you cannot interconnect two > trill clouds using an intermediate CE cloud - the trill clouds will need to > be merged using p2p trill links. This could be a problem if you plan to > incrementally upgrade your switches to trill, as opposed to a fork-lift > upgrade of your whole data center. As far as I can tell, you mean "bridged LAN" when you say "CE cloud". Of course you can connect trill clouds with a bridged LAN to form a single campus, you just can't use this particular technique at the same time. Considering the bridged LAN as a multi-access transit link, if you follow typically recommended network design and do not put end stations on that link, it doesn't make much difference that you can't use this technique. I don't think it has anything to do with "p2p trill links". TRILL has always fully supported multi-access links. There is no problem using one or more multi-access and/or p2p physical links to connect RBridges to a bridged LAN as part of a TRILL campus whether or not the topology is such that that bridged LAN is the only connection between two parts of the campus. Thanks, Donald ============================= Donald E. Eastlake 3rd +1-508-333-2270 (cell) 155 Beaver Street, Milford, MA 01757 USA d3e3e3@gmail.com > Note that the existing version of RFC 6325 does not have constraints on > interconnectivity of CE switches or rbridges as described above. > > Thoughts? > > -- > Sunny Rajagopalan _______________________________________________ rbridge mailing list rbridge@postel.org http://mailman.postel.org/mailman/listinfo/rbridge
_______________________________________________ rbridge mailing list rbridge@postel.org http://mailman.postel.org/mailman/listinfo/rbridge
- 答复: [rbridge] network topology constraints in dra… Mingui Zhang
- [rbridge] network topology constraints in draft-t… Santosh Rajagopalan
- Re: [rbridge] network topology constraints in dra… Tissa Senevirathne (tsenevir)
- Re: [rbridge] network topology constraints in dra… Tissa Senevirathne (tsenevir)
- Re: [rbridge] network topology constraints in dra… Santosh Rajagopalan
- Re: [rbridge] network topology constraints indraf… Tissa Senevirathne (tsenevir)
- Re: [rbridge] network topology constraints in dra… Donald Eastlake
- Re: [rbridge] network topology constraints in dra… Santosh Rajagopalan