[mpls] AD review of draft-ietf-mpls-inter-domain-p2mp-rsvp-te-lsp

"Adrian Farrel" <adrian@olddog.co.uk> Thu, 23 May 2013 20:53 UTC

From: Adrian Farrel <adrian@olddog.co.uk>
To: draft-ietf-mpls-inter-domain-p2mp-rsvp-te-lsp@tools.ietf.org
Date: Thu, 23 May 2013 21:11:02 +0100
Message-ID: <04ae01ce57f1$a6b72b80$f4258280$@olddog.co.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Thread-Index: Ac5X8Z5cL5m9BeCeSFm7SPc+FcNigQ==
Content-Language: en-gb
Cc: mpls@ietf.org, mpls-chairs@tools.ietf.org
Subject: [mpls] AD review of draft-ietf-mpls-inter-domain-p2mp-rsvp-te-lsp
Precedence: list
Reply-To: adrian@olddog.co.uk

Hi,

I have done my usual AD review of this draft on receipt of the 
publication request from the MPLS WG chairs.  As well as the desire
to remove issues that might otherwise show up during IETF last call and
IESG review (thereby saving resources), my review is to make sure that
I understand the intention and details of the work so that I can support
the document as it goes through these later reviews and also the
publication process.

As you will see from my comments below, I am not currently comfortable
with the document. At the moment I am not saying that the proposals are
bad and dangerous, although I am at least believing them to be 
unnecessary and based on some misstatements of the problems that need to
be addressed.

I understand that the chairs report there is WG consensus behind this
document in its current form, but I am unable to support it for 
publication as an RFC.

You have two options open to you to advance your work as a standards 
track RFC. Firstly, you could seek to address my concerns through a 
combination of changes to the text and discussions with me. Secondly,
you can attempt to find another AD to sponsor the work - possibly 
Stewart is a good starting point.

For the moment, I am returning the I-D to the working group.

Thanks,
Adrian

---

Was this document shown to the CCAMP working group? The P2MP work
(4875) was developed in partnership with CCAMP because it was intended
to be equally applicable to MPLS-TE and GMPLS. Presumably this is also
true of this work.  You might find that by consulting CCAMP you are
able to get some more P2MP implementers to have a look at the problems
this draft is proposing to solve.

---

I am surprised that the working group has taken this approach to the
described problem of re-merge avoidance. This problem is addressed by
the combination of PCE and Path Keys. However, if the working group has
considered that approach and has consensus to take this other approach,
I will not object.

Could the chairs confirm that the existing mechanisms have been 
considered and that the WG determined to add new extensions to the
signaling protocols instead.

---

I'm afraid I find myself wondering whether this document is addressing
the wrong problem :-(  Re-merge is a bad thing to have as a persistent
situation, and certainly must not be allowed to cause data duplication
downstream of the re-remerge point.

However, avoiding re-merge by selecting disjoint paths is not the
solution. If re-merge happens, it is because the path to one set of 
destinations has intersected the path to another set of destinations.
When this happens, one of the two paths to the re-merge point must be
optimal (for *any* definition of optimal) or the paths are equally
optimal. In either case, the correct solution is to move all of the
destinations (the union of the two sets) onto the same path and prune
out the sub-optimal one. In this case, the bottom line will be that
the upstream branch was wrong to use two distinct domain border nodes
for the two sets of destinations. That is the problem that needs to be
fixed.

What I seem to be reading here is an attempt to avoid re-merge that
favors the use of suboptimal paths. In many network configurations that
will be impossible. But in any case, it will prove as costly in terms of
network resources as the re-merge itself.

The discussion about re-optimization using 4736, is a different problem
that you can raise. The description of the problem doesn't really
surface until the description of the solution in Section 1.1, which is a
pity. This is largely a descriptive issue, although I would argue that
the partial reomptimization is easily handled using partial resignaling,
and therefore without any protocol extensions.

---

Please fix the minor spacing not reported by idnits. 

---

idnits shows several problems with references.

You are missing RFC 2119 from the normative references.
You are missing RFC 4874 from the informative references.
You are missing RFC 2205 from the normative references.
RFC 4726 is an informative reference, but not cited. Possibly work it
into the 4th para of the Introduction?
RFC 5920 is a downref. Does it need to be a normative reference?
RFC 4736 is a downref. It appears that you are using it in a normative
way - please confirm this so that we can handle the downref correctly.

Document Shepherd - please update the write-up to correctly note the
downrefs that remain after this work.

---

Please remove citations from the Abstract. The Abstract is stand-alone
text and cannot have external references.

---

The Abstract is not clear and needs work.
- The issues *may* arise, but do not always arise
- The "computation of loosely routed inter-domain P2MP-TE LSP paths that
  are re-merge free" is not an issue and is not addressed in this 
  document. Please describe the actual issue you are addressing.
- s/vs./versus/
- I don't think "the loosely routing domain ingress border node is not
  aware of the reoptimization scope" describes a problem well because
  the issue is the "there is no way to indicate which branches of the
  P2MP tree are to be reoptimised".
- In the light of my observation that techniques already exist to
  address the problems described in this document, it may be too strong
  to say "This document defines the required protocol extensions needed
  for ..." Maybe change this to "This document defines signaling
  protocol extensions for..."

---

It would really help the reader and reviewers if someone could take an
editorial pass on the document. This is not so much a problem of English
usage as missing words and such like. A native speaker would clean it up
very quickly and avoid the risk of the RFC Editor accidentally breaking 
the technical content.

---

In the Introduction...

   Consequently one
   of the requirements for signaling P2MP LSPs is to choose a P2MP path
   that is re-merge free.

Is this a signaling requirement? 
1. Surely it is a path computation requirement, if it is a requirement 
   at all.
2. Isn't the point that signaling is supposed to detect and resolve
   re-merge issues rather than avoid them?

---

In the Introduction...

   For the purposes of this document, a domain is considered to be any
   collection of network elements within a common sphere of address
   management or path computational responsibility. Examples of such
   domains include Interior Gateway Protocol (IGP) areas and Autonomous
   Systems (ASes). A border node is a node between different routing
   domains.

"domain" or "routing domain"?

---

In the Introduction...

   In that case, the border node for a new domain will be
   given loose next hops for one or more destinations in a P2MP LSP.

s/will be/may be/ ?

---

In the Introduction...

   A
   border node can ensure that it computes the re-merge free paths while
   performing loose hop ERO expansions by individually grafting
   destinations. Note that the computed P2MP tree by a border node in
   this case may not be optimal.

Why are you suggesting a mechanism for computing paths? That is an 
implementation detail. Furthermore, suggesting a mechanism that is
almost certain to generate a suboptimal solution seems perverse!

---

In the Introduction...

   In that case, existing protocol mechanisms
   do not provide sufficient information for it to be able to expand the
   loose hop(s) such that the overall P2MP LSP tree is guaranteed to be
   re-merge free.

Weeeeell...

Even if you don't want to use PCE and Path Key, you could use RRO and
XRO with suitable staggered processing at the branch node. That would
provide sufficient information using existing protocol elements, 
although a small (but obvious) piece of processing needs to added at 
the branch node

---

In the Introduction...

   [RFC4875] specifies two approaches to handle re-merge conditions. The
   first method is based on control plane handling the re-merge. In this
   case the node detecting the re-merge condition, i.e. the re-merge
   node initiates the removal of the re-merge sub-LSP(s) by sending a
   PathErr message(s) towards the ingress node. However, this can lead
   to a deadlock in setting up the P2MP LSP in certain cases; for
   example, when the first S2L setup causes the re-merge with all
   subsequent S2Ls in the tree.

I am glad you are now discussing the mechanisms in 4875, but I disagree
that there is a deadlock condition as you claim. You are saying that if
the first S2L sub-LSP takes a route that causes all other S2L sub-LSPs
to re-merge then there will be deadlock. Far from it! What will happen 
is that a PathErr will be sent for each subsequent S2L sub-LSP stating
"re-merge", and the destinations for each subsequent S2L sub-LSP will
be added to the set of destinations in the first S2L sub-LSP.

So, what is the issue with the first mechanism in 4875?

---

In the Introduction...

   [RFC4736] defines procedures and signaling extensions for
   reoptimizing an inter-domain P2P LSP. Specifically, an ingress node
   sends a "path re-evaluation request" to a border node by setting a
   flag (0x20) in SESSION_ATTRIBUTES object in a Path message. A border
   node sends a PathErr code 25 (notify error defined in [RFC3209]) with
   sub-code 6 to indicate "preferable path exists" to the ingress node.
   The ingress node upon receiving this PathErr may initiate
   reoptimization of the LSP. [RFC4736] however does not define a
   procedure to reoptimize the entire P2MP LSP as a whole tree.

I'm afraid that the mechanism you have accurately described could be 
used precisely as specified for reoptimising the whole tree since such 
reoptimization takes place from the ingress node. 

But I think (from 1.1) that you are trying to describe the subtleties 
involved in reoptimizing only some parts of the tree downstream of the
reporting border node.

Firstly, you need to clarify the problem being addressed with clearer
text in the Introduction.

Then you need to decide why you need to address the problem. In 4736 
there is no distinction made about which of the loose hops should be 
reomptimised and which not. So it is unclear why you should want to
apply such a filter in the case of a P2MP LSP. If there is a reason
it would be good to set it out in more detail.

It is also possible that the ingress will want to dampen its activity to
ensure is receives all 25/6 reports before starting reoptimization. It
is also possible that if you ignore the way 4875 handles remerge, this
could get complicated.

---

In the Introduction...

   The
   Sub-Group-Based reoptimization is not always applicable because it
   can lead to data duplication inside the backbone.

I am suspicious of this statement! Are you saying that the remerge 
issues may give rise to data duplication? Or is it the make-before-
break nature that may cause the problem? Is it a transitory problem
during the one or two seconds of signaling change, or is it a long-term
problem?

If you want to persist with this assertion then you need to substantiate
it in the draft.

---

The comparison in Section 1.1 of re-merge "avoidance" with crankback is
interesting, but the two issues are different. Crankback is designed to
facilitate re-route to avoid blocking resources, while re-merge 
avoidance is about moving destinations from one sub-tree to another.

But, you could consider the PathErr mechanism of 4875 as crankback 
because it reports the error relating to the re-merge node, it unpicks 
the LSP setup, and it allows an upstream node to "correct" the issue.

The only thing that you might want to add to 4875 is the replacement of
the ID of the reporting node with the ID of the reporting domain so that
the upstream node can apply meaning to the PathErr. Maybe if the problem
was clearly described, this solution would stand out.

---

Section 1.3 says that this work is limited to "multiple routing domains
that belong to a single administrative area. Use case for the Multiple
administrative domains (e.g. autonomous systems) is outside the scope
of this document."

A couple of points:
- An "administrative area" is a new term and the next sentence uses 
  "administrative domains" so that is probably what you mean.
- It is unclear whether multiple ASes is in scope since the second 
  sentence appears to rule them out, but multiple ASes may be under
  the care of one administrator.
- You haven't given any reason for excluding multiple administrative
  domains, and that would help people understand what your objectives
  are and what the problems are.

---

Section 3

   It is RECOMMENDED that boundary re-routing is requested for P2MP LSPs

The use of upper case "RECOMMENDED" is equivalent to "SHOULD".  This is
usually protocol requirements language.  Anyway, if you use "SHOULD" you
also need to discuss the associated "MAY".

---

In Section 3.1 you recommend that the ingress node of a P2MP LSP selects
the same ingress border node in the loose hop ERO for all sibling S2L
sub-LSPs that transit through a given domain.

This, of course, produces sub-optimal LSPs and can be resolved using 
PCE.

But I am interested in the overlap between this statement and the scope
statement in 1.3.  You are recommending using only single attachments
between domains: that means that remerge can only happen when the sub-
LSPs transit different domains and come back together in a further
domain.  But since you are (apparently) limiting to IGP areas and ruling
out ASes, the largest domain diameter you have is 3 with the result that
remerge is entirely impossible! 

---

In Section 3.1 you have "RECOMMENDED". What is the associated "MAY"?

---

Here's another example from Section 3.2

   If an ingress border node on the path of the P2MP LSP is unable to
   find a route that can supply the required resources or that is re-
   merge free, it MUST generate a PathErr message for the subset of the
   S2L sub-LSPs which it is not able to route.

This implies that the border node is trying to find a disjoint path.
Such a path represents a waste of network resources that is *worse*
than the data plane remerge case that you reject as a bad idea.

The point of re-merge avoidance is simply moving destinations from one
sub-tree to another and it can only be done at the branch point for the
two trees - or even at the ingress depending on your signaling approach
and explicit paths.

---

Section 3.2

   For this purpose the                        
   ingress border node SHOULD try to find a minimum subset of S2L sub-
   LSPs for which the PathErr needs to be generated towards the ingress
   node. These are the S2L sub-LSPs on an incoming interface that has
   less number of S2L sub-LSPs compared to the second incoming interface
   that is causing the re-merge condition.

OK. It took me four or five readings and a lot of pain to parse what you
are trying to say.

You are saying that, if the border node is a branch node for two sets of
sub-LSPs that are remerging, then the border node should fix this by 
moving the smaller set to share the path with the larger set.

There are two reasons why this is a bad idea:

1. The first set may have been set up with a Path/Resv exchange, while
   the second set has not been set up and a PathErr was returned.
   Maybe the remerge node could have made the decision you are 
   suggesting, but even that sounds like a bad idea.

2. The choice of the correct path to the remerge node should be made
   according to which is the shortest path, not which path has the 
   most destinations.

---

Section 3.2

   The RSVP-TE Notify messages do not include S2L_SUB_LSP objects and
   cannot be used to send errors for a subset of the S2L sub-LSPs in a
   Path message. 

This is not true!

A Downstream Notify message (headed upstream) is described in RFC 3473
as:

   <Notify message>            ::= <Common Header> [<INTEGRITY>]
                        [ [<MESSAGE_ID_ACK> | <MESSAGE_ID_NACK>] ... ]
                                   [ <MESSAGE_ID> ]
                                   <ERROR_SPEC> <notify session list>

   <notify session list>       ::= [ <notify session list> ]
                                   <upstream notify session> |
                                   <downstream notify session>

   <downstream notify session> ::= <SESSION> [<POLICY_DATA>...]
                                   <flow descriptor list>

And according to RFC 4875, <flow descriptor list> nets down to one or
more of...

   <S2L sub-LSP flow descriptor> ::= <S2L_SUB_LSP>
                                     [ <P2MP_SECONDARY_RECORD_ROUTE> ]

---

Section 3.2

   A border node receiving a PathErr message for a set of S2L sub-LSPs
   MAY hold the message and attempt to signal an alternate path that can
   avoid re-merge through its domain for those S2L sub-LSPs that pass
   through it. However, in the case of a re-merge error for which some
   of the re-merging S2L sub-LSPs do not pass through the border node,
   it SHOULD propagate the PathErr upstream towards the ingress node. If
   the subsequent attempt by the border node is successful, the border
   node discards the held PathErr and follows the crankback roles of
   [RFC4920] and [RFC5151]. If repeated subsequent attempts by the
   border node are unsuccessful, the border node MUST send the held
   PathErr upstream towards the ingress node.

How can an attempt to avoid re-merge be unsuccessful? There is already a
suitable path. We know this because the re-merge has happened.

---

Section 3.2

   If the ingress node receives a PathErr message with error code
   "Routing Problem" and error value "ERO resulted in re-merge", then it
   SHOULD attempt to signal an alternate path through a different domain
   or through a different border node for the affected S2L sub-LSPs. The
   ingress node MAY use the error node information from the PathErr for
   this purpose.

This is plain wrong. It should attempt to move the destinations to the 
same sub-tree, not try to signal the remerging destinations on a 
diverse sub-tree.

---

Section 4 purports to be about the dataplane re-merge handling scenario,
and it starts well. But then...

   The following sections define the RSVP-TE signaling extensions for
   "P2MP- TE Re-merge Recording Request" and "P2MP-TE Re-merge Present"
   messages.

That look like it is control plane work and so does not belong in this 
section.

Furthermore, this section appears to be offering a third solution to add
to the two noted in 4875. That is, you are proposing that dataplane
handling should be used, but that the control plane should be used to
resolve the issue.

This is an OK idea, but was discussed at the time of 4875 when two
approaches handling this situation were discussed.

1. Use a Notify message sent after the Resv
2. Use a non-destructive PathErr sent after the Resv

Admittedly, neither of those approaches is quite tidy, but it was 
thought that if you cared about the remerge you would fix it at setup
time, and if you didn't care, then you didn't care. Thus the case you
are fixing is a corner case (care a bit, but not too much) and the 
existing untidy solutions are enough.

Nevertheless, the mechanism you describe does provide some additional
useful diagnostics, and so should not be ruled out. Of course, those
diagnostics are already visible simply by inspecting the full set of
RRO information from the various S2L sub-LSPs as a re-merge point will
show up as the same node appearing in two different sub-trees. Thus, the
question only applies when RRO information is being stripped at domain
boundaries.

However, this last issue is the one you note in the final paragraph of
section 4.3. There you appear to say that since the re-merge report info
would be stripped from the Resv, the Resv should be discarded and a 
PathErr sent upstream. That would be fine, but it does seem to leave
half an LSP provisioned and doing nothing (downstream between the border
node and the re-merge point)

---

4.3

   This can be achieved by computing and
   selecting alternate path(s) for the S2L(s) bypassing the re-merge
   node(s).

Again, avoiding the remerge node is not the best result.

---

Section 5

   Re-merges between S2Ls in a single domain can occur due to
   provisioning errors or path computation errors in the environment
   where IGP-TE or PCE is used.

What is a "provisioning error"? Are you referring to cases where the 
EROs of P2MP trees are entered by hand?

What has IGP-TE to do with this?

If PCE (whether a separate component or embedded in an LSR) is making
such fundamental computation errors, then we should certainly not crash,
but I also don't think we should optimise the protocol to handle it. 
This represents a critical implementation bug!

---

Section 6.3

   Using signaling procedure defined in [RFC4736], an ingress node MUST
   initiate "path re-evaluation request" query to reoptimize a
   destination in a P2MP LSP. Note that this message MUST be used to
   reoptimize a single or a sub-set of the destinations in a P2MP LSP.
   Ingress node MUST send this query in a Path message for each
   destination it is reoptimizing.

   When a Path message for a destination in a P2MP LSP with "path
   re-evaluation request" flag [RFC4736] is received at the border node,
   it MUST re-compute the loose-hop ERO to see if a preferable path
   exists for that destination.

I am hugely worried about your use of 2119 language in this text. Is it
your intention to redefine the procedures of RFC 4736? Because that is
what you are doing! 

Perhaps you consider that you are only defining the procedures for P2MP
LSPs with the claim that they were not covered by 4736. I don't see on
what evidence such a claim would be made, and I am particularly
concerned that you have turned the request in 4736 into a demand in your
draft.

---

Section 6.3 is almost impossible to understand. I think there are 
probably some assumptions that a request to reoptimise the path to a 
single destination would:
a. be made
b. be responded with "not telling you, but I could reoptimise the 
   whole sub-tree."

On the other hand, there also seems to be a desire to send a 
reoptimise request that identifies just on destination, but is actually
a request to reoptimise the whole tree.

Colour me confused!

[mpls] AD review of draft-ietf-mpls-inter-domain-… Adrian Farrel
Re: [mpls] AD review of draft-ietf-mpls-inter-dom… Lou Berger
Re: [mpls] AD review of draft-ietf-mpls-inter-dom… John E Drake