I am the assigned IoT-Directorate reviewer for draft-ietf-babel-rtt-extension and reviewed version 05. I found teh document to be almost ready, with a few possible additions and changes that would make the reader's journey more agreeable. Please find my comments below.


Abstract

> Missing ref to updated document


1.  Introduction

   The Babel routing protocol [RFC8966] does not mandate a specific
   algorithm for computing metrics; existing implementations use a
   packet-loss based metric on wireless links and a simple hop-count
   metric on all other types of links.  While this strategy works
   reasonably well in many networks, it fails to select reasonable
   routes in some topologies involving tunnels or VPNs.

> an applicability statement of the various possibilities would be useful in the future. Could be a paper or an RFC. AT least it would make sense to have an applicability section here. For instance, IOT may experience large and asymetric delays. Current Wi-Fi (pre deterministic - RAW) can experience short delays but relatively important delay variation. It is unclear whether the work here could apply to any of these, seems more targetted at long range / stable and symmetrical delay comlines.


                                                             However,
   the existing implementations of Babel consider both routes as having
   the same metric, and will therefore route the traffic through C in
   roughly half the cases.
   
   > There's art in IGPs that allow configuring metrics or derive metrics from line speed. Is that availabe in current implementations? Note that for the given example speed of light will certainly have measurable effects. But going to Orleans and back may be hidden inside e.g., wireless delays. 
   

   In this document, we specify an extension to the Babel routing
   protocol that enables precise measurement of the round-trip time
   (RTT) of a link, and allows its usage in metric computation.  Since
   this causes a negative feedback loop, special care is needed to
   ensure that the resulting network is reasonably stable (Section 4).


   > I'm effectively concerned with the effect of buffer bloats which could create oscillations exactly like early ARPNET load-based metric.
   

   We believe that this protocol may be useful in other situations than
   the one described above, such as when running Babel in a congested
   wireless mesh network or over a complex link layer that performs its
   own routing; the fine granularity of the timestamps used (1µs) should
   make it possible to experiment with RTT-based metrics on this kind of
   link layers.
   
   
> Not sure we want that text. Highly debatable until experimented with, see curre,t experimentations of ARVR on Wi-Fi which suffer from variable lags.
   
   
3.2.  Protocol operation


   A Babel speaker periodically sends Hello messages to its neighbours
   (Section 3.4.1 of [RFC8966]).  Additionally, it occasionally sends a
   set of IHU messages, at most one per neighbour (Section 3.4.2 of
   [RFC8966]).


> define IHU on first use; explain what it is for vs Hello

      A          B
        |      |
     t1 +      |
        |\     |
        | \    |
        |  \   |  Hello(t1)
        |   \  |
        |    \ |
        |     \|
        |      + t1'
        |      |
        |      |               RTT = (t2 - t1) - (t2' - t1')
		
		
>  Ref IEEE 1588? there are many profiles for it; maybe this work could show as one. 

Important to indicate which time stamps are used (eg where in the stack is t1 measured). Do we measure the latency inside the sender meaning that the time stamp is that of the software above, or do we measure stating at MAX enqueue, or starting at PHY XMIT?

For short distance / high precision as claimed in the introduction, setting the time stamps in the message above is hard and the standards often allow a second message to carry the value of the time stamp that the hardware provides for the first sending. Depending on the answer above, this might or might not be needed.


   In order to enable the computation of RTTs, a node A MUST include in
   every Hello that it sends a timestamp t1 (according to A's local
   clock), as illustrated in Figure 2.  When a node B receives A's
   timestamped Hello, it computes the time t1' at which the Hello was
   received (according to B's local clock).  It then MUST record the
   value t1 in the Origin Timestamp field of the Neighbour Table entry
   corresponding to A, and the value t1' in the Receive Timestamp field
   of the Neighbour Table entry.
   
> Do we need a sequence counter to filter out bloated IHU answers that are received out of sync?
   

   In principle, this algorithm is inaccurate in the presence of clock
   drift (i.e., when A's and B's clocks are running at different
   frequencies).  However, t2' - t1' is usually on the order of seconds,
   and significant clock drift is unlikely to happen at that time scale.
   
> back to applicability of the work. I believe some expectations on the clock drift vs RTT can be made for modern hardware. Nodes have an idea of which clock they use and what drift they have. The draft could recommend that the clocking error be 2 orders of magnitude less than the RTTs that the protocol measures, else the measurement cannot be trusted.


   When a Hello TLV is buffered for transmission, we insert a PadN sub-
   TLV (Section 4.7.2 of [RFC8966]) with a length of 4 octets within the
   TLV.  When the packet is ready to be sent, we check whether it
   contains a 4-octet PadN sub-TLV; we then overwrite the PadN sub-TLV
   with a Timestamp sub-TLV with the current time, and send out the
   packet.
   
   
> hardware will not do that. Back to my earlier question of which step in the stack is relevant for this measurement. Surelly any step that is dependent on the load of this system (variable but independent of the link being used)  as opposed to the load to the transmission should be omitted.   


   Second, using the RTT signal for route selection gives rise to a
   negative feedback loop: when a route has a low RTT, it is deemed to
   be more desirable, which causes it to be used for more data traffic,
   which may lead to congestion, which in turn increases the RTT.
   Without some form of hysteresis, using RTT for route selection would
   lead to oscillations between parallel routes, which might lead to
   packet reordering and negatively affect upper-layer protocols (such
   as TCP).

> I believe this discussion should be seen earlier in the text, eg in the introduction (not the solution but at least that the issue exists and is addressed in the protocol). See my early comment on ARPANET.


4.3.  Hysteresis

   Even after applying a bounded mapping from smoothed RTT to a cost
   value, the cost may fluctuate when a link's RTT is between rtt-min
   and rtt-max.  This is effectively mitigated by using a robust
   hysteresis algorithm, such as the one described in Appendix A.3 of
   [RFC8966].
   
   
> if this is what solves the oscillation issue please mention it, and maybe discuss how it does so. Maybe provide references to the papers that exist on the matter and your early trials.


8.  Security Considerations

> Maybe discuss the consequences of a MIM that modifies the values eg to discourage Paris to Paris and cause routing via Tokyo?