I am reviewing this document draft-ietf-rtgwg-ipfrr-framework-12 as part of the security directorate's ongoing effort to review all IETF documents being processed by the IESG. These comments were written primarily for the benefit of the security area directors. Document editors and WG chairs should treat these comments just like any other last call comments (that you received well after last call). Feel free to forward to any appropriate forum. This draft describes a framework for mechanisms to compute backup "repair" paths that allow traffic to continue to forward to the destination around failed links or nodes. This is similar to a mechanism in MPLS to provide backup LSPs by using RSVP-TE. For background, I also looked at RFC4090 Fast Reroute Extensions to RSVP-TE for LSP Tunnels RFC 5296 Basic Specification for IP Fast Reroute: Loop-Free Alternates draft-ietf-rtgwg-lf-conv-frmwk-06 A Framework for Loop-free Convergence draft-bryant-ipfrr-tunnels-03 IP Fast Reroute using tunnels draft-ietf-rtgwg-ipfrr-notvia-addresses-03 IP Fast Reroute Using Not-via Addresses The security considerations covers some concerns introduced by using fast-reroute mechanisms where providing repair paths may introduce vulnerabilities, particularly where the repair paths could interfere with existing robustness mechanisms (reverse path forwarding and TTL limits). I wonder if the knowledge of computed repair paths could be useful to an attacker in doing traffic-shaping, i.e., an attacker who was doing link-cutting to affect traffic flow. Similarly, the ability to influence the computation of the repair path could be valuable. Perhaps the framework document could mention that the proposed solutions should consider the need to protect the computation from exposure (to the same extent infrastructure information is protected from exposure) and corruption/contamination if such an attack were a concern. Note: RFC5286 dismisses this concern, without noting the benefit of an attacker to be able to producing a desired forwarding change if a failure (and therefore repair activity) could be induced: Traffic to certain destinations can be temporarily routed via next-hop routers that would not be used with the same topology change if this mechanism wasn't employed. However, these next-hop routers can be used anyway when a different topological change occurs, and hence this can't be viewed as a new security threat. The notvia-addresses draft does seem to recognize this risk from repair paths: The repair endpoints present vulnerability in that they might be used as a method of disguising the delivery of a packet to a point in the network. The loop-free convergence framework draft in the last-called version said: All micro-loop control mechanisms raise significant security issues which must be addressed in their detailed technical description. which I thought should be reflected in this framework, but that has changed in the post-last-call version to: This document analyzes the problem of micro-loops and summarizes a number of potential solutions that have been proposed. These solutions require only minor modifications to existing routing protocols and therefore do not add additional security risks. However a full security analysis would need to be provided within the specification of a particular solution proposed for deployment. I'm curious as to how "significant security issues" changed to "do not add additional security risks", but I don't have the time to track down the last call comments that created that change. (I'd say that adding any additional information to a routing protocol may not change the underlying vulnerabilities of the routing protocol but certainly provides new means to cause damage when exploiting the known vulenrabilities.) Non-security related comments: The following comment would have been useful to see before section 6: 6. Scope and applicability The initial scope of this work is in the context of link state IGPs. Link state protocols provide ubiquitous topology information, which facilitates the computation of repairs paths. The following comment would have been useful to see before section 4.2.5: Complete protection against multiple unrelated failures is out of scope of this work. Some terms in this document are never defined and/or used ambiguously. The following terms are not in the terminology list: repair path Shared Risk Link Groups (SRLG) - perhaps so common in the teleco world and in optical networks that definition was judged unnecessary. "link(node) protecting" "being protected" "fully protected (link/node)" "unprotectable" - it would seem to refer to a link or node failure for which a repair path is being computed, but may also mean mechanisms to avoid micro-loops. a path being "affected" by a reconvergence - which I think means that reconvergence to a new forwarding tree does not change any link in the repair path a repair path being "valid for a node" or "valid for destinations" Some text I found confusing: Page 7: 2. In topologies that are susceptible to micro-loops, a mechanism to prevent the effects of any micro-loops during subsequent re- convergence. Because the loop free convergence draft distinguishes micro-loop prevention from micro-loop suppression, which attempts to avoid the impact on other traffic from micro-loops, I thought "the effects of any micro-loops" referred to this collateral damage. But later in that page it talks about micro-loop avoidance, which sounds like the loop free convergence draft's term "micro-loop prevention". So I'm not sure what is meant here. Page 10: 1. The percentage of links (or nodes) which can be fully protected for all destinations. This is appropriate where the requirement is to protect all traffic, but some percentage of the possible failures may be identified as being un-protectable. What does it mean to be "fully protected for all destinations"? Is that redundant? And what about "unprotectable"? Is it possible to have fully protected for all destinations fully protected for some destinations partially protected for all destinations partially protected for some destinations unprotectable for all destination unprotectable for some destinations etc.? The outline in section 4 has me a bit confused: 4. Mechanisms for IP Fast-reroute . . . . . . . . . . . . . . . . 7 4.1. Mechanisms for fast failure detection . . . . . . . . . . 7 4.2. Mechanisms for repair paths . . . . . . . . . . . . . . . 8 4.2.1. Scope of repair paths . . . . . . . . . . . . . . . . 9 4.2.2. Analysis of repair coverage . . . . . . . . . . . . . 9 4.2.3. Link or node repair . . . . . . . . . . . . . . . . . 10 4.2.4. Maintenance of Repair paths . . . . . . . . . . . . . 11 4.2.5. Multiple failures and Shared Risk Link Groups . . . . 11 4.3. Local Area Networks . . . . . . . . . . . . . . . . . . . 12 4.4. Mechanisms for micro-loop prevention . . . . . . . . . . . 12 Section 4.2.5 covers SRLG's as an example of multiple related failures, which it says are out of scope. Then section 4.3 covers LANs - which would seem to me to satisfy the definition of a SRLG. Is 4.3 supposed to be a subtopic under 4.2.5? It seems out of keeping with the mechanism focus of sections 4.1, 4.2 and 4.4. Also, 4.3 says: Protection against partial or complete failure of LANs is more complex than the point to point case. In general there is a trade- off between the simplicity of the repair and the ability to provide complete and optimal repair coverage. (That's a complete quote of that section, which is another reason for asking if it is really supposed to be a separate section) Does this imply that all previous discussion was about point to point links only? RFC 5286 has an extended discussion of computing backup paths for broadcast and NBMA links, so I was surprised to see this draft seem to indicate that LANs were out of scope. --Sandy Murphy