Hello I have been selected to do a Routing Directorate early review of this draft.. https://datatracker.ietf.org/doc/draft-ietf-bess-evpn-fast-df-recovery. I reviewed revision -07. The Routing Directorate will, on request from the working group chair, perform an early review of a draft before it is submitted for publication to the IESG. The early review can be performed at any time during the draft's lifetime as a working group document. The purpose of the early review depends on the stage that the document has reached. I reviewed revision -07 of this draft after the completion of WG last call and shepherd review, but before the WG makes a publication request. The purpose of a review at this stage is to improve the quality before AD review and so reduce the AD workload as wellas improve the output of the working group and the final quality of any published RFCs. For more information about the Routing Directorate, please see https://wiki.ietf.org/en/group/rtg/RtgDir Document: draft-ietf-bess-evpn-fast-df-recovery-07.txt Reviewer: Adrian Farrel Review Date: 2023-05-27 Intended Status: Standards Track Summary: I have some minor concerns about this document that I think should be resolved before it is submitted to the AD. This document is short and readable. While most of what I found is nits, they somewhat detract from the clarity of the document. The minor points could do with small additions to the text to help the reader and smooth the document's passage through the IESG. === Abstract should note that this document updates RFC 8584 and say (briefly) how. (Note that idnits warned you about this.) I would put similar text in the Introduction, but that text can have a pointer to Section 2.2). I note that the topic of "updates" was raised by matthew in his review, and that 2.2 is clear about the changes. I note, however, that the decision to make this an Update happened after WG last call, and I wonder whether the owrking group is fully on board with the implication that any future implementation claiming conformance to 8584 will be required to implement this specification. There is a difference between an Update and an additional procedure. --- Abstract Move "(DF)" to the first use of "Designated Forwarder" Please expand "EVI" and "PE" s/via a simple signaling/via simple signaling/ --- Move the requirements language into a new Section 1.1 Please use the correct version of the boilerplate text... The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. --- 1. Please expand "DF", "EVI", and "PE" on first use. s/applying Highest/applying the Highest/ s/independent of number/independent of the number/ s/via a simple signaling/via simple signaling/ s/on simple one-way signaling/on a simple one-way signaling/ --- 1.2 s/in redundancy group/in a redundancy group/ --- 1.2 The RFC Editor will ask you to consider another term in place of "blackholing". You might choose to retain the term, or you might think that it is OK to say "packets being dropped if the timer is too long" There are quite a few similar uses throughout the document. --- 1.2 upon re-entry of the packet I think you mean, "upon the packet re-entering the Ethernet Segment" --- 1.3. This section is a bit messy because it talks about the "proposed solution" in text that will eventually become an RFC, and because it makes cryptic references to mechanisms that have not yet been described. I am not convinced that you actually need this text (why are you trying to sell the benefits having already told us there is a problem to be solved?) but if you want to keep the text, I would suggest that you position it as "design principles" and scale it right back. Something like... 1.3. Design Priniciples for a Solution The solution presented in this document follows several design pronciples as follows. * Complicated handshake signamling mechanisms and state machines are avoided in favor of a simple uni-directional signaling approach. * The solution is backwards-compatible (see Section 4). * Existing DF Election algorithms are supported. * The solution is independent of any BGP delays in propagation of Ethernet Segment routes (Route Type 4). * The solution is agnostic of the actual time synchronization mechanism used. --- 2. s/participating to a common/participating in a common/ s/at a same pre-announced time/at the same pre-announced time/ s/e.g. NTP/e.g., NTP/ s/along with Ethernet/along with the Ethernet/ s/from local PE/from the local PE/ s/is called "Service/is called the "Service/ s/by newly added/by the newly added/ -- 2. What is "carving state"? I see that "service carving" is a term of art in RFCs 7432 and 8584, but there is no mention there of "carving state". --- 2. OLD to communicate to other partners the Service Carving Time NEW to communicate the Service Carving Time to other partners END --- 2. When a new PE is inserted or an existing PE device Unqualified verb: a new PE is inserted where? Probably "inserted in an Ethernet segment". Incomplete clause: an existing PE device is what? Possibly "booted up" in line with the text after Figure 1. --- 2. Upon reception of that new BGP Extended Community, partner PEs can determine exactly the anticipated carving time. s/reception/receipt/ Why "exactly"? Is there some doubt that other mechanisms might not be exact? How "anticipated"? Is this still guesswork? Maybe just... ...can determine the carving time. --- 2.1 s/needs to be defined/is defined/ s/along with Ethernet/along with the Ethernet/ s/as a 8-octet/as an 8-octet/ s/the NTP protocol/NTP/ --- 2.1 * Timestamp Fractional Seconds: 16 bits of the NTP fractional seconds are encoded in this field. The use of a 16-bit fractional seconds yields adequate precision of 15 microseconds (2^-16 s). Which 16 bits? I think "the high order 16 bits" --- 2.1 Do you intend this solution to have a "bad time" when the NTP era transitions in 2036? Or do you expect the implementation to handle NTP wrapping? If so, you should probably point this out (because otherwise I can see some fun in the field). --- 2.1 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type = 0x06 | Sub-Type(0x06)| RSV | DF Alg | |A| |T| ~ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ~ Bitmap | Reserved = 0 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ * Bit 3: Time Synchronization (corresponds to Bit 27 of the DF Election Extended Community). When set to 1, it indicates the desire to use Time Synchronization capability with the rest of the PEs in the Ethernet Segment. Why bit 3? I know the answer is "because that's the bit our implementation uses" and I'm fine with that and I'm not asking for any change, but it makes me suspicious about what happened to bits 0 and 2. Why aren't they used? Is someone squatting on them? --- 2.2 9.1 Where SCT timestamp is present on the RECV_ES event of Action 11, wait the remaining time before continuing to 9.2. You could usefully clarify "the remaining time". I think you are saying that the implementation should "wait until the time indicated by the timestamp in the (with the possible reduction by the skew value) is in the past". This would also cover you against the transmission of timestamps that are already in the past or which go into the past when the skew is deducted. --- 2.2 Why do you show: 10. DF_DONE on exiting the state: If a new DF election is triggered and the current DF is lost, then assume an NDF for the local PE for the VLAN or VLAN Bundle. 11. DF_DONE on VLAN_CHANGE, RCVD_ES, or LOST_ES: Transition to DF_CALC. As far as I can see, these are unchanged from 8584/2.1 and showing them in this section is confusing. --- What's a "peering timer"? I think you have assigned a name to the timer described in step 2 of section 8.5 of RFC 7432. No reason not to give it a name, but perhaps you should make it nice and clear what you are talking about? So... In the examples in section 3 (thanks for these), it looks like the value of the SCT and the value of the peering timer are coincidentally "now+3" and 3 respectively. In fact they are both based on the value of the timer configured for all PEs on the Ethernet Segment. This could be clearer. --- 4. PEs running a baseline DF election mechanism will simply discard the new SCT BGP extended community as unrecognized. Would be useful to give a reference for that behavior. --- 4. A PE can indicate its willingness to support clock-synched carving by signaling the new 'T' DF Election Capability as well as including the new Service Carving Time BGP extended community along with the Ethernet Segment Route (Type-4). In the case where one or more PEs attached to the Ethernet Segment do not signal T=1, all PEs in the Ethernet Segment SHALL revert back to the [RFC7432] timer approach. This is especially important in the context of the VLAN shuffling with more than 2 PEs. I see some challenges here. The first is when a PE joins the segment while DF election is ongoing among the other PEs. The second is what happens if the SCT BGP extended community is present when the T-bit is not set. The third is what happens if the T-bit is set but no SCT BGP extended community is provided. All of these are easy to describe. --- 5. Is there a "downgrade attck" achieved by causing a T-bit to be clear? --- 5. This document uses MPLS and IP-based tunnel technologies to support data plane transport. Security considerations described in [RFC7432] and in [RFC8365] are equally applicable. I don't think this document is concerned with the data plane transport. On the other hand, you should describe the impact of any attacks on the new protocol elements (for example, what happens if the SCT is set to a value that is many hours ahead?). Of course, your escape clause remains that the message must be protected in the same way that other messages (in 7432 and 8584) are protected, but you should still describe the risks. --- 6. Looks like the SCT extended community has already been assigned. So, you need... OLD This document solicits the allocation of the following sub-type in the "EVPN Extended Community Sub-Types" registry setup by [RFC7153]: 0x0F Service Carving Timestamp This document NEW IANA maintains the "EVPN Extended Community Sub-Types" registry set up by [RFC7153]. IANA is requested to confirm the First Come First Served assignment as follows: Sub-Type Value | Name | Reference | DATE ---------------+---------------------------+---------------+------ 0x0F | Service Carving Timestamp | This document | TBD IANA should replace the field TBD with the date of publicaton of this document as an RFC. END --- 6. Just a little political correctness... OLD This document solicits the allocation of the following values in the "DF Election Capabilities" registry setup by [RFC8584]: Bit Name Reference ---- ---------------- ------------- 3 Time Synchronization This document NEW IANA maintains the "DF Election Capabilities" registry set up by [RFC8584]. IANA is requested to make the following assignment from this registry: Bit Name Reference Date ---- ---------------- ------------- ---- 3 Time Synchronization This document TBD IANA should replace the field TBD with the date of publicaton of this document as an RFC. END