Note that to the extent that a network round-trip path, including the receiver, acknowledges data (cumulatively or selectively) ASAP, there will be no aggregation effects (all data will be smoothly acknowledged at exactly the rate at which it was transmitted).
Due to speed-of-light limits, it is not possible for a receiver to SACK data "much faster than the sending rate" unless at some earlier point some part of the network path delayed the transmission or generation of packets (or it is a
misbehaving receiver, and violating the protocol spec by ACKing data that has not arrived yet).
Aggregation effects happen when something in the path first (a) delays the generation, transmission, or processing of either data packets or ACK packets, and then (b) initiates a burst of delayed packet/ACK generation/transmission/processing by pulling from some queue of delayed packets (at a rate that is potentially faster than the long-term data rate).
To the extent that many OSes will, when receiving out-of-order data, disable delayed ACKs and expedite the generation of ACKs with SACK blocks, this will reduce the degree of aggregation, because there will be less delay in handling of packets, and thus less opportunity for delayed work to happen in a burst.
Our experience in testing and trace analysis of real public Internet and datacenter traffic suggests that the biggest causes of aggregation are effects below the TCP layer:
+ L2/link-layer mechanisms, where some link-layer technologies must queue packets while they wait for their turn to transmit on some kind of shared medium that is multiplexed in time between different senders: cellular, wifi, and DOCSIS links are like this
+ offload mechanisms, where hardware (TSO/LRO) or software (GSO, GRO, driver) offload mechanisms build batches of packets and release them in a burst
Those mechanisms are mostly agnostic to the sequence numbers of the TCP packets – since they happen at layers below TCP – and so they are agnostic to whether data is arriving out of sequence order, or whether ACKs contain SACK blocks, and thus will not change their aggregation behavior in scenarios with SACKs. For the mechanisms that are not agnostic to sequence numbers, like LRO/GRO receiver aggregation mechanisms, or driver/qdisc ACK decimation algorithms, when there are out-of-order packets that tends to trigger immediate action, and so will tend to reduce opportunities for the kind of delay/burst pairing that is involved in aggregation effects.
The one widely-deployed mechanism that I'm aware of that potentially increases the degree of aggregation in recovery scenarios is the Linux TCP SACK compression mechanism added by Eric Dumazet in 2018:
However, note that even that mechanism limits the degree of aggregation to 1ms, which is the typical degree of aggregation from Linux TCP TSO/GSO anyway, and so typically does not actually increase the overall degree of aggregation during recovery, but instead makes the generation of ACKs with SACKs more closely reflect the typical degree of aggregation.
As a result of all these considerations, in our experience in testing and trace analysis, we have generally not seen higher degrees of aggregation in fast recovery scenarios.
Are you seeing something different in testing or trace analysis? Or was your concern mainly theoretical?
Thanks,
neal