[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PWE3] draft-bryant-filsfils-fat-pw



Shane Amante wrote:
> Luca,
>
> On Dec 11, 2007, at 2:23 PM, Luca Martini wrote:
>> Tom/Shane,
>>
>> Let me see if I can attempt to clarify the requirements.
>> First let's assume that we want preserve the order of the PW packets,
>> then I see the following situations:
>>
>> 1) PW is of comparable size to some MPLS  trunk link in a link
>> bundle. (let's say at least  10% of the size of the link).
>> 2) PW is larger then some individual trunk link in some bundle.
>> 3) PW can use network ECMP. ( no local link bundles are present along
>> the PW path )
>>
>> We also have the following sub cases.
>> a) PW contains IP traffic , and can be identified at ingress.
>> b) PW does not contain IP traffic/traffic cannot be ID/IP traffic is
>> a single huge flow.
>>
>> Any combinations of the above are possible. So I think this draft
>> makes an attempt at solving 3a , and 1a.
>
> I agree with you, so far.  IMHO, in the context of just 1a vs. 3a, 1a
> is a more pressing problem than 3a.
>
>
>> However I think that the most common problem is 1b. Applications that
>> generate large single amount of pw traffic tend to be enterprise
>> applications.
>> Like database syncing , backups, encrypted 10GE links etc.
>>
>> It seems to me , that the multiple receiving labels per PW solution
>> is simple, does not change the pwe3 architecture, and does not overly
>> complicates the hardware design of the PE. Since this can clearly be
>> used on an exception basis it should be adequate to mitigate problem
>> 1a/3a.
>
> In giving this more thought, and in private discussions with some
> others, the PW Labels Block approach seems more like a 'band-aid',
> than providing a relatively long-term fix for the problem at hand. 
> Specifically, consider
Yes I agree that it is not perfect, however it does not change the
forwarding plane , which means that cost optimized hardware has a chance
of supporting it.
> the following use cases where you [will] see fat PW's:
> 1) point-to-point VPWS, specifically in a hub-and-spoke environment,
> e.g.: a Hub site containing a (N x) 10 GbE NNI to another carrier,
> Enterprise DataCenter, etc.  Funneling in to that Hub site will be
> GbE, N x GbE and 10 GbE EoMPLS VC's.
> 2) multipoint-to-multipoint VPLS.  Although I can't say this is a hard
> requirement at the moment, I would expect that as VPLS adoption
> continues to grow that the solution for "Fat PW's" being discussed
> here will, most likely, get re-used in VPLS at some point down the road.
> 3) perhaps, further out than VPLS, MS-PW ...
>
VPLS, and VPWS use the PWs. So I believe that any solution we can design
for a PW would automatically apply there as well.

> Ultimately, the PW Labels Block approach burns LFIB space.  This seems
> to me to be particularly problematic in p2mp and mp2mp topologies,
> since the impact on LFIB space is felt by all PE nodes using the PW
> Labels Block approach.  In addition, I'm not necessarily convinced the
> impact on LFIB space will be minor, since the more VC 
I do not believe that there is a requirement for Huge single Multicast
flows at this point.
is this what you mean my p2mp ?

> labels that are allocated for a particular PW, the more diverse input
> is provided to load-hashing algorithms within core LSR's and,
> ultimately, the more evenly flows are distributed over component-links.
>
> Finally, the PW Labels Block approach is concerning because operators
> will likely have to play around to find the 'right' size PW Labels
> Block for their network, LAG/ECMP sizes, etc.  Thus, as their network
> grows, they'll likely have to go back and re-adjust the PW Label Block
> larger or smaller always trying to optimize LFIB size vs. even
> load-balancing ...
>
This is very tricky. Since there is no standard , we need to guess what
happens here. A small number of labels should be sufficient. We can
always use a programmatic system based on link BW/ and numbers to figure
out how many labels we use.

> Ultimately, BW is growing and shows no signs of abating, (BW growth is
> good for all of us! :-).  I don't believe, as you state above, that
> this solution will only see limited use in "exception cases".  Other
> networks will, if they 
You make a big assumption that we can identify the flows inside the PW
at the AC.
With 10G ethernet encryption hardware approaching commodity pricing ,
I'm not sure that it is a good assumption.


> don't already, need a solution to this same problem.  Therefore, I
> would advocate we think through the design fully and make sure it's:
> a) easy to configure/use/operate, esp. over long time scales; b) it's
> easily extensible to other protocols, (e.g.: VPLS); and, c) of course,
> scalable and interoperable with other network elements.
>
>
>
>> I do not believe that problem 2 is solvable while keeping the packets
>> in order. However maybe that is not a requirement.
>>
>> In the case of problems in the category "1b", we can solve them by
>> implementation without requiring a new protocols.
>
> I'm not sure I follow if, or how, you're proposing on solving 1b,
> since you're (b) above says that: "PW does not contain IP
> traffic/traffic cannot be ID/IP traffic is a single huge flow" ...
> then, you go on to list applications for which there is no way for the
> ingress PE to identify a microflow in order to assign either a PW
> Label Block or Load-Balance Label.  Can you clarify what you mean above?
>
Not on this list. ;-)
The point is that there are solutions that do not require us to change
any protocols.

Luca


> Thanks,
>
> -shane
>
>
>
>> So what is the most pressing problem to solve ?
>>
>> Luca
>>
>>
>> Thomas Nadeau wrote:
>>>
>>>
>>>
>>>> OK, so speaking as yet another operator....
>>>>
>>>> there's a clear need to support fat PWEs, but I'm yet to be
>>>> convinced that this draft is the correct solution to the problem.
>>>>
>>>> The intro to the draft talks about the application being to
>>>> interconnect IP routers. If that's the case then why not use an IP
>>>> pseudowire?  If you do that then there will just be one label, but
>>>> (AFAIK) many routers will spot the 0x4 (or 0x6) in the first nibble
>>>> of the payload and do a hash on the IP header - giving optimum
>>>> traffic distribution and also preserving the order of each flow.
>>>>
>>>> If the payload is not IP then I think we have a problem at any
>>>> rate, as we don't necessarily know how to identify a "flow".  Sure,
>>>> you could do a MAC hash for an Ethernet pseudowire, but in many
>>>> cases you see precisely one pair of MAC addresses on the PWE.
>>>>
>>>> Giles
>>>>
>>>> On Nov 28, 2007 2:47 PM, Shane Amante <shane at castlepoint.net
>>>> <mailto:shane at castlepoint.net>> wrote:
>>>>
>>>>    Hi Yaakov,
>>>>
>>>>    Yaakov Stein wrote:
>>>>    > Stewart and other authors
>>>>    >
>>>>    > I just finished reading the FAT-PW draft, and have a few
>>>>    comments/questions.
>>>>    >
>>>>    > 1. The draft says "Operators have requested the ability..."
>>>>    >     Since I have never heard this request from any of the
>>>>    operators with
>>>>    > which we work,
>>>>    >     can this be changed to "Some operators have requested ..." ?
>>>>    >     Since there is one operator on the author list, I guess we
>>>>    can guess
>>>>    > which operator has requested
>>>>    >     this feature !
>>>>
>>>>    Speaking as /another/ operator, I can say there is an absolutely
>>>>    strong
>>>>    need to solve this problem, (and, has been for quite a long time,
>>>>    actually).  Consider the fact that 10 GbE has become (is
>>>> becoming?) a
>>>>    pretty common access circuit to Backbones and that within most SP
>>>>    networks the dominant Backbone link size are 10G.  As you're
>>>>    likely well
>>>>    aware, the IEEE HSSG is working on both 40 GbE and 100 GbE.  Once
>>>>    40 GbE
>>>>    is available, (and assuming its used for WAN connectivity, perhaps
>>>>    similar to 10 GbE LAN PHY), then OC-768c Backbone links will
>>>>    suffer the
>>>>    same problem.  100 GbE will, eventually, be used as both core and
>>>>    access
>>>>    links.  In short, this problem is not going away.  We need to
>>>>    solve it.
>>>>
>>>
>>> Agreed.  Speaking as another operator, I too am concerned that we solve
>>> this problem, but I do not like the approaches described in this draft.
>>>>    > 2. The example given is for Ethernet PWs. Is this draft limited
>>>>    to this
>>>>    > case?
>>>>    >     There is discussion of whether it is limited to IP over
>>>>    Ethernet,
>>>>    >     but this more basic question is not addressed.
>>>>    >     For example, could this load balancing to be performed for
>>>>    ATM PWs
>>>>    >     based on the AAL5 flows?
>>>>
>>>>     From my perspective, Ethernet is far and away the biggest "problem
>>>>    child" out there today, due to the size of access to Backbone
>>>> links,
>>>>    (see above).  While it may be admirable to look at making this
>>>> draft
>>>>    "generic" for a variety of PW types, I wouldn't lose any sleep if
>>>>    this
>>>>    draft remained focused on just Ethernet.
>>>>
>>>>
>>>>
>>>>    > 3. PWs are an emulation of the native service.
>>>>    >    Why is this emulation being called upon to deliver a
>>>> feature NOT
>>>>    > present in the native service ?
>>>>    >    Doesn't this break the model a bit?
>>>>    >
>>>>    > 4. A native service processing function is required for
>>>>    differentiating
>>>>    > between different flows
>>>>    >    at ingress. If this draft is indeed limited to Ethernet PWs,
>>>>    such a
>>>>    > processing function
>>>>    >    already exists in the native service. 802.3 clause 43 (LAG)
>>>>    defines
>>>>    > conversations
>>>>    >    for exactly this purpose (commonly implemented by hashing IP
>>>>    > addresses and port numbers),
>>>>    >    and even mentions the use of load balancing in the
>>>>    distribution of
>>>>    > conversations over links.
>>>>    >    I think this function should be at least referenced.
>>>>    >
>>>>    > 5. My greatest problem is with the prefered mode of section 1.1,
>>>>    >     which builds a PW label stack under the MPLS label stack.
>>>>    >     The proposal is for 2 PW labels (once again, somewhat
>>>>    breaking RFC3985).
>>>>    >     Figure 2 is not completely clear about the label structure.
>>>>    >     There are two possibilities:
>>>>    >      1) both load balancing label and PW label have stack bit
>>>>    set. (I
>>>>    > hope not !)
>>>>    >      2) the load balancing label has S=1, and the PW label has
>>>>    S=0.
>>>>    >          So formally, the PW label seems to be an MPLS label.
>>>>    >     Both possibilities break the standard model.
>>>>    >
>>>>    >    I would certainly like to see more justification of the
>>>> problem
>>>>    >    before breaking the model in this way.
>>>>    >    Perhaps a short requirements document is in order?
>>>>
>>>>    When I read the draft, this is the part I also had the most concern
>>>>    with.  In particular, I like the "simplicity" of the LB Label
>>>>    approach
>>>>    (i.e.: savings on FIB space, no need to signal first and last
>>>>    labels for
>>>>    each PW, etc.); however, I am concerned about the implications
>>>> of, or
>>>>    potential need to, define a 'generic' MPLS PW label.
>>>>
>>>
>>> In addition to this, I suggest that the requirements first be
>>> investigated before we go ahead with this solution. Speaking as
>>> someone who needs to make different boxes interoperate in a network,
>>> I would prefer a SINGLE solution to this problem.
>>> When we have different protocols, it is generally ok to have
>>> different approaches, but having
>>> different approaches in this case seems to make things exponentially
>>> harder.
>>>
>>> --Tom
>>>
>>>
>>>>    My primary concern is future extensibility.  Specifically, in
>>>>    case there
>>>>    are /other/ applications, which may or may not have been brought
>>>>    to the
>>>>    surface, yet, that may have similar needs/desire for a 2nd PW
>>>>    label.  If
>>>>    that ultimately means we gain consensus to amend the PWE3
>>>>    Architecture,
>>>>    I'm OK with that, but certainly we would need to have more
>>>>    discussion to
>>>>    see whether or not it is a good approach and, more importantly,
>>>>    what are
>>>>    the other implications that go along with it?
>>>>
>>>>
>>>>
>>>>    > 6. The draft recommends generating a load balancing label in
>>>>    such fashion
>>>>    >     that the entropy is high. This assumes that the precise
>>>>    form of the
>>>>    > label
>>>>    >     is used to determine the load balancing path (possibly a
>>>>    hash of
>>>>    > some sort).
>>>>    >     Could this mechanism, even if beyond the scope of the
>>>>    document, be
>>>>    > explained a bit more ?
>>>>
>>>>    Load-balancing over LAG and ECMP paths, using some number of MPLS
>>>>    labels
>>>>    as input to a load-balancing hash algorithm, is common across all
>>>>    vendors.  However, such algorithms are 'proprietary' to each
>>>> vendor.
>>>>    I'm not sure how much more can be said other than the fact that,
>>>> one
>>>>    would strongly prefer that the output of a LAG or ECMP hashing
>>>>    algorithm
>>>>    is spread out among the largest number of hash buckets, (as is
>>>>    practical), to get the most even distribution of flows across a
>>>>    set of N
>>>>    links in a LAG or ECMP path.  And, I think the draft already
>>>>    makes this
>>>>    point, in Section 3:
>>>>    ---snip---
>>>>       It is recommended that the method chosen to
>>>>       generate the load balancing labels introduces a high degree of
>>>>       entropy in their values, to maximise the entropy presented to
>>>> the
>>>>       ECMP path selection mechanism in the LSRs in the PSN, and hence
>>>>       distribute the flows as evenly as possible over the available
>>>> PSN
>>>>       ECMP paths.
>>>>    ---snip---
>>>>
>>>>    Is there something else you had in mind?
>>>>
>>>>    -shane
>>>>
>>>>
>>>>    > 7. With the optional mode of section 1.2 several PW labels are
>>>>    mapped to
>>>>    > a single AC.
>>>>    >     I have no problem with this approach. In fact, I feel that
>>>>    it is
>>>>    >     somewhat similar to the solutions being proposed for PW
>>>>    protection.
>>>>    >     For PW protection two labels mapped to the AC or end-user
>>>>    application,
>>>>    >     where one label belongs to the active PW, and the other to
>>>> the
>>>>    > backup PW (not being used).
>>>>    >     For load balancing two or more PWs, all in active state,
>>>>    are mapped
>>>>    > to the same AC.
>>>>    >     Would it be possible to integrate the two features into one
>>>>    mechanism
>>>>    >     for mapping multiple PW labels in either active or backup
>>>>    state to
>>>>    > one AC or end-user identifier?
>>>>    >
>>>>    > 8. The term VC as opposed to PW is used in various places in
>>>>    the document.
>>>>    >     I am not sure what is meant here. Is the intent that a "VC"
>>>>    is one
>>>>    > of the paths of the
>>>>    >     load-balanced "PW" ?
>>>>    >
>>>>    > The first paragraph of section 4 seems to imply that the
>>>>    authors are
>>>>    > willing to settle
>>>>    > on either of the modes rather than both. I would support the PW
>>>>    label mode.
>>>>    > If some entropy-rich information needs to be placed in the
>>>> packet,
>>>>    > perhaps the flags in the CW could be used (if 16 paths is
>>>>    sufficient).
>>>>    >
>>>>    > Y(J)S
>>>>    >
>>>>    >
>>>>    >
>>>>    >
>>>>   
>>>> ------------------------------------------------------------------------
>>>>
>>>>
>>>>    >
>>>>    > _______________________________________________
>>>>    > pwe3 mailing list
>>>>    > pwe3 at ietf.org <mailto:pwe3 at ietf.org>
>>>>    > https://www1.ietf.org/mailman/listinfo/pwe3
>>>>
>>>>
>>>>
>>>>    _______________________________________________
>>>>    pwe3 mailing list
>>>>    pwe3 at ietf.org <mailto:pwe3 at ietf.org>
>>>>    https://www1.ietf.org/mailman/listinfo/pwe3
>>>>
>>>>
>>>> _______________________________________________
>>>> pwe3 mailing list
>>>> pwe3 at ietf.org <mailto:pwe3 at ietf.org>
>>>> https://www1.ietf.org/mailman/listinfo/pwe3
>>>
>>> ------------------------------------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> pwe3 mailing list
>>> pwe3 at ietf.org
>>> https://www1.ietf.org/mailman/listinfo/pwe3
>>>
>>



_______________________________________________
pwe3 mailing list
pwe3 at ietf.org
https://www1.ietf.org/mailman/listinfo/pwe3