Title: Liaison Response to ITU-T Q14/15 Liaison about CCAMP Crankback Draft Date: 15 Jan 2005 From: Adrian Farrel To: Mr. Kam Lam, Rapporteur Q14/15 From: Adrian Farrel and Kireeti Kompella, IETF CCAMP co-chairs Cc: Alex Zinin and Bill Fenner, IETF Routing Area Directors Scott Bradner, IETF liaison to ITU-T Subject: Crankback in GMPLS Systems For: Information Dear Kam, Thank you for your liaison concerning draft-ietf-ccamp-crankback-03. It is useful to have additional review input from a wide audience. Please convey our special thanks to Stephen Shew and Marco Carugi for their detailed review of the draft in Geneva. We would like to urge Q14/15 to continue to consider this draft as further work is carried out on crankback within the context of G.7713. In response to the specific points that were raised in the liaison... > 1. Semantics of the term "node". Due to the GMPLS principle of > maintaining separation of control and transport (data/bearer) planes, > there are two meanings for the term "node". First, an instance of a > signalling protocol (and/or routing protocol) that has some transport > resources in its scope. Second, a transport plane resource such as a > cross connect. Using the first meaning, a node is not the context for > the interface identifiers that are passed in crankback TLVs. > Throughout the document the particular meaning can be determined > by the context of the term. Examples are: > > - Section 5.2, the sentence "Otherwise, multiple nodes might attempt to > repair the LSP." means the control functions of signalling and routing. > > - Section 7.1 "As described above, full crankback information SHOULD > indicate the node, link and other resources, which have been attempted." > refers to the transport resource. It is correct to observe that historically there has been poor separation of controllers and transport devices within GMPLS, with much of this issue arising from the historic collocation of controllers and data switches in MPLS networks. This persists because of the (eminently sensible) tendency to optimize for the majority case. However, in the case of crankback, and specifically in the case of this draft, the emphasis in providing 'full crankback information' is on the addresses of transport links and nodes and not controllers. We will revisit the draft to ensure that where control plane function is implied, the "node" that takes action is clearly identified as the control plane node. > There are some occasions where the use of the term appear to be > ambiguous and clarity would be appreciated. In particular TLV > types 10 and 32. If type 10 represents a routing and signalling > function, then what TLV describes the "transport plane node" > (e.g., cross connect or Network Element)? If type 32 means > "transport plane nodes", then a different TLV may be needed > to identify the "routing/signalling nodes" that have already > participated in crankback attempts. > Having a clearer distinction between control plane functions > and transport plane resources would be helpful. As indicated above, the intention of crankback is to apply a process to the path determination for an LSP. The path is determined using transport plane links and nodes, and although there may be some interesting aggregation available by converting this information to control plane nodes, the conversion is not necessarily simple. Thus, these TLVs all refer to transport plane quantities, and we will make this clearer in the draft. Again, of course, in the majority case we can make considerable optimizations by knowing that control plane and transport plane "nodes" are related in a 1:1 ratio and are usually collocated. > 2. When crankback information is received at a "routing/signalling > node", can it be used by the routing path computation function for other > LSP requests than the LSP whose signalling caused the crankback action? It is generally out-of-scope for the IETF to dictate how individual implementations operate. It is quite conceivable that such an action would be taken, but it is also clear that there is a potentially dangerous interaction with the TE flooding process (i.e. the IGP). Thus we would say that the crankback information MAY be used to inform other path computations. We would want to be very cautious that crankback is not intended to supplement or replace the normal operation of the TE flooding mechanism provided by the TE extensions to the IGP except for the establishment of a single LSP. If the IGP is found to be deficient as a flooding mechanism we would expect to look first at ways to address the problems through IGP extensions before utilizing a signaling mechanism. We will look at how to add some of this information to the draft. > 3. Section 6.1 "Segment-based Re-routing" option. It is not clear > what this means. Can multiple "routing/signalling nodes" perform > crankback on the same LSP at the same time if this flag is set? Since the intention is to establish only one LSP, there must be only one active sequence of LSP setup messages (RSVP-TE Path messages) at any time. Thus only one LSR may attempt re-routing at any one time. If you consider the processes by which Path messages are attempted and crankback information is returned on PathErr messages, this will be clear. That is, when an PSR receives a crankback PathErr, it may attempt to re-route or it may forward the PathErr back upstream. It might help if we reworded the draft to say "Any node may attempt rerouting after it receives an error report and before it passes the error report further upstream." > 4. Section 4.3 History persistence. If a repair point (a > "routing/signalling node") is unsuccessful in a crankback attempt, is it > possible for it to be not involved when another repair point (e.g., > closer to the source) succeeds in a crankback attempt. If so, how > does the first repair point know to clear its history? Note that the purpose of the history table as described in section 4.3 is to correlate information when repeated retry attempts are made by the same LSR. Suppose an attempt is made to route from A through B, and the signalling controller for B returns a failure with crankback information. An attempt may be made to route from A through C, and this may also fail with the return of crankback information. The next attempt SHOULD NOT be to route from A through B, and this is achieved by use of the history table. The history table can be discarded by the signaling controller for A if the LSP is successfully established through A. The history table MAY be retained after the signaling controller for A sends an error upstream, however it is questionable what value this provides since a future retry as a result of crankback rerouting should not attempt to route through A (such is the nature of crankback). If the history information is retained for a longer period it SHOULD be discarded after a local timeout has expired, and that timer MUST be shorter than the timer used by the ingress to re-attempt a failed service (note that re-attempting a failed service is not the same as making a re-route attempt after failure). As mentioned for point 2, the crankback information MAY be used to enhance future routing attempts for any LSP, but this is not what section 4.3 is describing. We will try to clarify this in the draft. > 5. Section 4.5 Retries. Some guidance on setting the number of > retries may be helpful as this is a distributed parameter. Is it set to > be the same value at all points that can perform crankback within one > network? The view of CCAMP at the moment is that although it is technically possible to allow the number of retries to be set for each LSP, this probably represents too much configuration and too fine a level of control. It seems likely that initial deployments will wish to set the number of retries per node through a network-wide configuration constant (that is, all LSRs capable of retrying will apply the same count) with the possibility of configuring specific LSRs to have greater or lower counts. Note that configuring an LSR not to be able to perform retries is equivalent to configuring the retry count to be zero for that LSR. It is also probable that initial deployments will significantly restrict the number of LSRs within the network that can perform crankback rerouting. This would probably be limited to "boundary" nodes. In the event that implementations and deployments wish to control the number of retries on a per LSP basis, we would revisit the signaling specification and add the relevant information to the Path and PathErr messages. The actual value to set for a retry threshold is entirely a deployment issue. It will be constrained by the topology and nature of the network. It would be inappropriate to suggest a figure in this draft since there are no hard and fast rules. In review of section 4.5 of the draft, we see that there is some old text describing more flexibility in the control of retries than we intend to provide. Thank you for drawing our attention to this; we will clean it up. Thank you once again for your feedback on this draft. If you have further comments, we would certainly like to hear them. The easiest way for individuals to contribute to the discussion of this topic is by sending mail to the CCAMP mailing list. Details of how to subscribe to this list can be found at http://www.ietf.org/html.charters/ccamp-charter.html Yours sincerely, Adrian Farrel and Kireeti Kompella