[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PWE3] draft-bryant-filsfils-fat-pw



Luca,

On Dec 11, 2007, at 2:23 PM, Luca Martini wrote:
Tom/Shane,

Let me see if I can attempt to clarify the requirements.
First let's assume that we want preserve the order of the PW packets, then I see the following situations:


1) PW is of comparable size to some MPLS trunk link in a link bundle. (let's say at least 10% of the size of the link).
2) PW is larger then some individual trunk link in some bundle.
3) PW can use network ECMP. ( no local link bundles are present along the PW path )


We also have the following sub cases.
a) PW contains IP traffic , and can be identified at ingress.
b) PW does not contain IP traffic/traffic cannot be ID/IP traffic is a single huge flow.


Any combinations of the above are possible. So I think this draft makes an attempt at solving 3a , and 1a.

I agree with you, so far. IMHO, in the context of just 1a vs. 3a, 1a is a more pressing problem than 3a.



However I think that the most common problem is 1b. Applications that generate large single amount of pw traffic tend to be enterprise applications.
Like database syncing , backups, encrypted 10GE links etc.


It seems to me , that the multiple receiving labels per PW solution is simple, does not change the pwe3 architecture, and does not overly complicates the hardware design of the PE. Since this can clearly be used on an exception basis it should be adequate to mitigate problem 1a/3a.

In giving this more thought, and in private discussions with some others, the PW Labels Block approach seems more like a 'band-aid', than providing a relatively long-term fix for the problem at hand. Specifically, consider the following use cases where you [will] see fat PW's:
1) point-to-point VPWS, specifically in a hub-and-spoke environment, e.g.: a Hub site containing a (N x) 10 GbE NNI to another carrier, Enterprise DataCenter, etc. Funneling in to that Hub site will be GbE, N x GbE and 10 GbE EoMPLS VC's.
2) multipoint-to-multipoint VPLS. Although I can't say this is a hard requirement at the moment, I would expect that as VPLS adoption continues to grow that the solution for "Fat PW's" being discussed here will, most likely, get re-used in VPLS at some point down the road.
3) perhaps, further out than VPLS, MS-PW ...


Ultimately, the PW Labels Block approach burns LFIB space. This seems to me to be particularly problematic in p2mp and mp2mp topologies, since the impact on LFIB space is felt by all PE nodes using the PW Labels Block approach. In addition, I'm not necessarily convinced the impact on LFIB space will be minor, since the more VC labels that are allocated for a particular PW, the more diverse input is provided to load-hashing algorithms within core LSR's and, ultimately, the more evenly flows are distributed over component-links.

Finally, the PW Labels Block approach is concerning because operators will likely have to play around to find the 'right' size PW Labels Block for their network, LAG/ECMP sizes, etc. Thus, as their network grows, they'll likely have to go back and re-adjust the PW Label Block larger or smaller always trying to optimize LFIB size vs. even load- balancing ...

Ultimately, BW is growing and shows no signs of abating, (BW growth is good for all of us! :-). I don't believe, as you state above, that this solution will only see limited use in "exception cases". Other networks will, if they don't already, need a solution to this same problem. Therefore, I would advocate we think through the design fully and make sure it's: a) easy to configure/use/operate, esp. over long time scales; b) it's easily extensible to other protocols, (e.g.: VPLS); and, c) of course, scalable and interoperable with other network elements.



I do not believe that problem 2 is solvable while keeping the packets in order. However maybe that is not a requirement.

In the case of problems in the category "1b", we can solve them by implementation without requiring a new protocols.

I'm not sure I follow if, or how, you're proposing on solving 1b, since you're (b) above says that: "PW does not contain IP traffic/ traffic cannot be ID/IP traffic is a single huge flow" ... then, you go on to list applications for which there is no way for the ingress PE to identify a microflow in order to assign either a PW Label Block or Load-Balance Label. Can you clarify what you mean above?


Thanks,

-shane



So what is the most pressing problem to solve ?

Luca


Thomas Nadeau wrote:



OK, so speaking as yet another operator....

there's a clear need to support fat PWEs, but I'm yet to be convinced that this draft is the correct solution to the problem.

The intro to the draft talks about the application being to interconnect IP routers. If that's the case then why not use an IP pseudowire? If you do that then there will just be one label, but (AFAIK) many routers will spot the 0x4 (or 0x6) in the first nibble of the payload and do a hash on the IP header - giving optimum traffic distribution and also preserving the order of each flow.

If the payload is not IP then I think we have a problem at any rate, as we don't necessarily know how to identify a "flow". Sure, you could do a MAC hash for an Ethernet pseudowire, but in many cases you see precisely one pair of MAC addresses on the PWE.

Giles

On Nov 28, 2007 2:47 PM, Shane Amante <shane at castlepoint.net <mailto:shane at castlepoint.net >> wrote:

   Hi Yaakov,

Yaakov Stein wrote:
> Stewart and other authors
>
> I just finished reading the FAT-PW draft, and have a few
comments/questions.
>
> 1. The draft says "Operators have requested the ability..."
> Since I have never heard this request from any of the
operators with
> which we work,
> can this be changed to "Some operators have requested ..." ?
> Since there is one operator on the author list, I guess we
can guess
> which operator has requested
> this feature !


Speaking as /another/ operator, I can say there is an absolutely
strong
need to solve this problem, (and, has been for quite a long time,
actually). Consider the fact that 10 GbE has become (is becoming?) a
pretty common access circuit to Backbones and that within most SP
networks the dominant Backbone link size are 10G. As you're
likely well
aware, the IEEE HSSG is working on both 40 GbE and 100 GbE. Once
40 GbE
is available, (and assuming its used for WAN connectivity, perhaps
similar to 10 GbE LAN PHY), then OC-768c Backbone links will
suffer the
same problem. 100 GbE will, eventually, be used as both core and
access
links. In short, this problem is not going away. We need to
solve it.



Agreed. Speaking as another operator, I too am concerned that we solve
this problem, but I do not like the approaches described in this draft.
   > 2. The example given is for Ethernet PWs. Is this draft limited
   to this
   > case?
   >     There is discussion of whether it is limited to IP over
   Ethernet,
   >     but this more basic question is not addressed.
   >     For example, could this load balancing to be performed for
   ATM PWs
   >     based on the AAL5 flows?

From my perspective, Ethernet is far and away the biggest "problem
child" out there today, due to the size of access to Backbone links,
(see above). While it may be admirable to look at making this draft
"generic" for a variety of PW types, I wouldn't lose any sleep if
this
draft remained focused on just Ethernet.




> 3. PWs are an emulation of the native service.
> Why is this emulation being called upon to deliver a feature NOT
> present in the native service ?
> Doesn't this break the model a bit?
>
> 4. A native service processing function is required for
differentiating
> between different flows
> at ingress. If this draft is indeed limited to Ethernet PWs,
such a
> processing function
> already exists in the native service. 802.3 clause 43 (LAG)
defines
> conversations
> for exactly this purpose (commonly implemented by hashing IP
> addresses and port numbers),
> and even mentions the use of load balancing in the
distribution of
> conversations over links.
> I think this function should be at least referenced.
>
> 5. My greatest problem is with the prefered mode of section 1.1,
> which builds a PW label stack under the MPLS label stack.
> The proposal is for 2 PW labels (once again, somewhat
breaking RFC3985).
> Figure 2 is not completely clear about the label structure.
> There are two possibilities:
> 1) both load balancing label and PW label have stack bit
set. (I
> hope not !)
> 2) the load balancing label has S=1, and the PW label has
S=0.
> So formally, the PW label seems to be an MPLS label.
> Both possibilities break the standard model.
>
> I would certainly like to see more justification of the problem
> before breaking the model in this way.
> Perhaps a short requirements document is in order?


When I read the draft, this is the part I also had the most concern
with. In particular, I like the "simplicity" of the LB Label
approach
(i.e.: savings on FIB space, no need to signal first and last
labels for
each PW, etc.); however, I am concerned about the implications of, or
potential need to, define a 'generic' MPLS PW label.



In addition to this, I suggest that the requirements first be investigated before we go ahead with this solution. Speaking as someone who needs to make different boxes interoperate in a network, I would prefer a SINGLE solution to this problem.
When we have different protocols, it is generally ok to have different approaches, but having
different approaches in this case seems to make things exponentially harder.


--Tom


   My primary concern is future extensibility.  Specifically, in
   case there
   are /other/ applications, which may or may not have been brought
   to the
   surface, yet, that may have similar needs/desire for a 2nd PW
   label.  If
   that ultimately means we gain consensus to amend the PWE3
   Architecture,
   I'm OK with that, but certainly we would need to have more
   discussion to
   see whether or not it is a good approach and, more importantly,
   what are
   the other implications that go along with it?



   > 6. The draft recommends generating a load balancing label in
   such fashion
   >     that the entropy is high. This assumes that the precise
   form of the
   > label
   >     is used to determine the load balancing path (possibly a
   hash of
   > some sort).
   >     Could this mechanism, even if beyond the scope of the
   document, be
   > explained a bit more ?

Load-balancing over LAG and ECMP paths, using some number of MPLS
labels
as input to a load-balancing hash algorithm, is common across all
vendors. However, such algorithms are 'proprietary' to each vendor.
I'm not sure how much more can be said other than the fact that, one
would strongly prefer that the output of a LAG or ECMP hashing
algorithm
is spread out among the largest number of hash buckets, (as is
practical), to get the most even distribution of flows across a
set of N
links in a LAG or ECMP path. And, I think the draft already
makes this
point, in Section 3:
---snip---
It is recommended that the method chosen to
generate the load balancing labels introduces a high degree of
entropy in their values, to maximise the entropy presented to the
ECMP path selection mechanism in the LSRs in the PSN, and hence
distribute the flows as evenly as possible over the available PSN
ECMP paths.
---snip---


   Is there something else you had in mind?

   -shane


> 7. With the optional mode of section 1.2 several PW labels are
mapped to
> a single AC.
> I have no problem with this approach. In fact, I feel that
it is
> somewhat similar to the solutions being proposed for PW
protection.
> For PW protection two labels mapped to the AC or end-user
application,
> where one label belongs to the active PW, and the other to the
> backup PW (not being used).
> For load balancing two or more PWs, all in active state,
are mapped
> to the same AC.
> Would it be possible to integrate the two features into one
mechanism
> for mapping multiple PW labels in either active or backup
state to
> one AC or end-user identifier?
>
> 8. The term VC as opposed to PW is used in various places in
the document.
> I am not sure what is meant here. Is the intent that a "VC"
is one
> of the paths of the
> load-balanced "PW" ?
>
> The first paragraph of section 4 seems to imply that the
authors are
> willing to settle
> on either of the modes rather than both. I would support the PW
label mode.
> If some entropy-rich information needs to be placed in the packet,
> perhaps the flags in the CW could be used (if 16 paths is
sufficient).
>
> Y(J)S
>
>
>
>
------------------------------------------------------------------------


   >
   > _______________________________________________
   > pwe3 mailing list
   > pwe3 at ietf.org <mailto:pwe3 at ietf.org>
   > https://www1.ietf.org/mailman/listinfo/pwe3



   _______________________________________________
   pwe3 mailing list
   pwe3 at ietf.org <mailto:pwe3 at ietf.org>
   https://www1.ietf.org/mailman/listinfo/pwe3


_______________________________________________ pwe3 mailing list pwe3 at ietf.org <mailto:pwe3 at ietf.org> https://www1.ietf.org/mailman/listinfo/pwe3

------------------------------------------------------------------------

_______________________________________________
pwe3 mailing list
pwe3 at ietf.org
https://www1.ietf.org/mailman/listinfo/pwe3





_______________________________________________ pwe3 mailing list pwe3 at ietf.org https://www1.ietf.org/mailman/listinfo/pwe3