This document has been reviewed as part of the transport area review team's ongoing effort to review key IETF documents. These comments were written primarily for the transport area directors, but are copied to the document's authors and WG to allow them to address any issues raised and also to the IETF discussion list for information. When done at the time of IETF Last Call, the authors should consider this review as part of the last-call comments they receive. Please always CC tsv-art@ietf.org if you reply to or forward this review. During WGLC, I reviewed the WHIP protocol document: https://mailarchive.ietf.org/arch/msg/wish/-ia_-FrZyJh8TRuD64hlvyfNoB0/ The comments described there have subsequently been addressed, so I have no further feedback to add on the topics of my original review. However, there is an "Elephant in the Room" issue that was not brought up in my original review, but which has come up in implementations. So I feel it needs to be at least mentioned in the document. The issue relates to the congestion control algorithms applicable to ingestion scenarios. Currently, libwebrtc implements Google Congestion Control (gcc), which makes sense for WebRTC's primary use case (browser-based communications, including simulcast). So when WHIP is implemented in a browser this is the cc algorithm that will be used. In such a scenario gcc has some unique capabilities (simulcast layer add/drop) but also some downsides. For example, where WebRTC has been used for ingestion in mobile applications such as Youtube, the congestion control has been more tuned toward maximizing throughput rather than minimizing latency. For example, the following video by Ying Yin of Youtube, describes how Google handled the tradeoff between quality and latency in the design of their WebRTC ingestion system: https://www.youtube.com/watch?v=htN-gIPOkP0 There is another data point that suggests that simple single a/v stream ingestion use cases may benefit from alternatives to the cc approach implemented in libwebrtc. The MoQ WG has been developing protocols for both ingestion and distribution. On the ingestion side, initial implementations (e.g. Meta's RUSH) have relied on the algorithms implemented in QUIC stacks (e.g. typically Qubic or BBRv2), and appear to be able to deliver high quality media within the constraints applied (e.g. only a single audio and video stream ingested). Given that ingestion implementations in mobile applications appear to prefer alternatives to the gcc algorithms implemented in libwebrtc, I think there needs to be some discussion of CC algorithms in the document, if only to point out avenues for further investigation (e.g. L4S, BBRv3, etc.). I understand that it is probably not feasible to come up with a definitive recommendation at this time. One complicating factor is that WHIP supports more complex ingestion use cases, such as simulcast ingestion. Simulcast is something that gcc excels at (e.g. how it decides to add or drop layers), due to the built-in probing functionality. Since simulcast support is not likely to be a mainstream ingestion feature (e.g. it is not supported in RUSH) this probably didn't carry much weight in the designs cited earlier. Handling simulcast congestion control within a QUIC stack is quite tricky because it may require integration with the application (e.g. libwebrtc uses RTX probes, which is specific to RTP).