Request for Review: Dual-Loop Predictive Latency Controller Tested over BBRv1

133 views

Skip to first unread message

Jin-Hyeong Lee

unread,

Dec 3, 2025, 8:13:13 AM12/3/25

to BBR Development

Dear BBR Development Group,

I am writing to share a recent technical note detailing Proactive Latency Control (PLC), a lightweight dual-loop predictive latency controller designed as a supervisory wrapper over BBRv1.

This work addresses the predictable yet disruptive RTT inflation caused by handovers in Low Earth Orbit (LEO) satellite networks, which current delay-reactive controllers often misinterpret as congestion. I have only evaluated PLC on a BBR v1-like implementation, and I would greatly appreciate your feedback before extending the evaluation to BBR v2/v3.

Summary of PLC (Adaptive Supervisory Layer)

PLC is an O(1) dual-loop adaptation layer that operates strictly as a shim above unmodified BBR, without touching its internal state machine:

Fast Loop (Anticipatory): Pre-intervention rate modulation based on RTT change, responding in $\le 1$ RTT.
Slow Loop (Homeostatic): Adaptive self-damping (200–400 ms time scale) using a Lyapunov-inspired, bounded-log multiplicative gain update (K_p) for stability and oscillation suppression.

Key Results & Integration Challenge (BBR v1-like)

Under Heavy Load LEO handover dynamics (up to 75 handovers per 500 s), PLC demonstrates significant stabilization:

Latency Stabilization: Reduced 700–1,200 ms bufferbloat oscillations to a 100–150 ms regime.
Metric Improvement: Mean latency reduced by 300–326 ms, and p99 by 21–45 ms.
Efficiency: Computational overhead remains minimal (~36 μs per 10 ms cycle).

The critical challenge identified is a catastrophic interaction during asymmetric STARTUP (leading to >31s latency spikes), requiring compulsory Active Queue Management (AQM) like CoDel as a failsafe. This is due to the incompatibility between PLC's rapid damping and BBR's exponential probing in that phase.

Request for Guidance on BBR v2/v3 Integration

Before investing time in BBR v2/v3 testing, I seek your expert guidance on the following:

Predictive/Dual-Loop History: Have similar supervisory dual-loop or predictive wrappers been evaluated internally for BBR, particularly in overcoming RTT inflation events?
Safe STARTUP Integration: Given the observed incompatibility, what is the recommended, phase-aware approach for safely integrating external rate damping during BBR's STARTUP phase? (e.g., completely disabling the wrapper until the first PROBE_BW transition).
Pacing/Rate Modulation: Are there known issues or non-linear effects when applying an external multiplicative rate modulation (outside of BBR's internal control) on top of BBR's highly tuned pacing layer?
Measurement Methodology (BBRv3): What are the best practices for handling RTT/bandwidth sampling windows (especially min_rtt aging) to accurately capture performance differences when introducing external control?

A PDF containing detailed methodology, formulas, and reproducibility details is attached.(DOI: 10.36227/techrxiv.176281113.30584908/v2)

Thank you very much for your time and for maintaining such a technically rigorous community.

Best regards,

Jin-Hyeong Lee

Lee_JH_2025_Medical_Decision_Inspired_PLC_v_2.pdf

Maksim Lanies

unread,

Dec 3, 2025, 10:35:15 AM12/3/25

to Jin-Hyeong Lee, BBR Development

Dear Dr. Jin-Hyeong Lee,
Thank you for sharing your excellent work on Proactive Latency Control (PLC) for LEO satellite networks. We have thoroughly reviewed your TechRxiv publication (DOI: 10.36227/techrxiv.176281113.30584908/v2) and find your dual-loop architecture both elegant and highly relevant to our ongoing research.
We are CloudBridge Research, a network protocol research group based in Russia, focusing on QUIC optimization for inter-regional and mobile networks. Your PLC approach addresses a fundamental problem we have also encountered: BBR's misinterpretation of predictable RTT spikes as congestion.
Our Research Context
We have developed quic-test [1], an open-source QUIC testing platform with the following capabilities:
Technical Infrastructure
BBRv2/BBRv3 Implementation: Modified quic-go fork with experimental congestion control
Forward Error Correction (FEC): XOR/RS codes with AVX2 optimization
Recovery rate: 80-90% for single packet losses
Overhead: 10-20% additional bandwidth
Automated Test Matrix: 1,260 tests across RTT/load/connection scenarios
Network Emulation: Linux tc (netem) for realistic conditions
Monitoring Stack: Prometheus metrics + Grafana dashboards
Real-world Traces: Moscow ↔ Frankfurt (~50ms RTT), Moscow ↔ US (~150ms RTT)
Key Research Findings [2]
Our October 2025 laboratory research revealed:
Scenario CUBIC BBRv2 Improvement
Low RTT (5ms) 95.2 Mbps 98.1 Mbps +3.0%
Medium RTT (50ms) 78.5 Mbps 112.3 Mbps +43.1%
High RTT (200ms) 45.2 Mbps 89.7 Mbps +98.5%
ACK-Frequency Optimization:
Optimal frequency: 2-3 (15-25% overhead reduction)
Latency improvement: 10-20%
FEC Performance:
Critical for mobile networks (4G/5G) with 5% packet loss
Minimal latency impact
Significant reliability improvement in high-loss scenarios
GitHub Repository: https://github.com/twogc/quic-test
Research Website: https://cloudbridge-research.ru/research
Why PLC Resonates with Our Work
Your PLC architecture addresses three challenges we face in our deployment scenarios:
1. Mobile Network Handovers (4G/5G)
While your focus is LEO satellites, we work extensively with mobile networks where handover patterns are remarkably similar:
Predictable timing: Cell tower transitions every 10-30 seconds
Transient RTT spikes: +50-200ms during handover
Burst packet loss: 5-15% during transition
Your RTT-based detection mechanism (ρ = RTT_short / RTT_long > 1.20) appears directly applicable to terrestrial mobile handovers.
2. Inter-Regional Links
Our RU↔EU routes experience:
Variable RTT (40-80ms) due to routing changes
Periodic congestion events (predictable patterns)
BBR's reactive behavior causing unnecessary rate reductions
PLC's anticipatory control could significantly improve stability.
3. Connection Migration (QUIC Feature)
QUIC supports connection migration (Wi-Fi ↔ LTE), which causes similar RTT inflation patterns to satellite handovers. PLC could enhance this critical feature.
Proposed Collaboration
We are interested in three research directions and would value your guidance:
Direction 1: PLC Validation on BBRv2/v3
Background:
Your PLC was tested on a BBRv1-like implementation. BBRv2 introduced several improvements:
Less aggressive STARTUP phase (refined pacing_gain schedule)
Improved RTT estimation (better filtering)
Enhanced PROBE_RTT mechanism
Research Question:
Does BBRv2's less aggressive STARTUP reduce the catastrophic failure scenario you observed (31,541ms latency spike during asymmetric initialization)?
Our Proposal:
Integrate PLC into our quic-test platform
Validate PLC behavior on BBRv1 (baseline reproduction)
Test PLC on BBRv2 and BBRv3
Measure phase-dependent behavior across all versions
Quantify STARTUP conflict severity: BBRv1 vs BBRv2 vs BBRv3
Expected Outcome:
Determine if BBRv2/v3's architectural improvements naturally mitigate the phase-conflict issue, potentially reducing CoDel dependency.
Timeline: 2-3 weeks
Direction 2: PLC + FEC Integration
Hypothesis:
Combining PLC's anticipatory rate control with our FEC's loss recovery could provide synergistic benefits during handover events.
Proposed Mechanism:
Pre-handover (t-2s):
  PLC detects ρ > 1.20 → reserves capacity (12-22% reduction)
  
Handover start (t-0s):
  PLC reduces rate preemptively
  FEC enabled with 15% overhead
  
Handover peak (t+0.2s):
  FEC recovers 80-90% of burst losses
  PLC's pre-intervention band prevents queue buildup
  
Post-handover (t+1s):
  PLC adaptive gain restores rate
  FEC overhead reduced to 10%
Research Questions:
Does FEC reduce the need for aggressive PLC rate reduction?
Can PLC's handover detection trigger adaptive FEC overhead (10% → 20% during handover)?
What is the combined improvement in jitter and packet loss?
Our Contribution:
We can provide our FEC implementation (C++/AVX2 optimized) for integration testing.
Timeline: 3-4 weeks
Direction 3: Mobile Network Extension
Research Question:
Can PLC's RTT-based detection generalize to terrestrial mobile handovers (4G/5G, Wi-Fi ↔ LTE)?
Proposed Validation:
Collect real 4G/5G handover traces (we have access to mobile network testbeds)
Characterize handover patterns (RTT spike magnitude, duration, frequency)
Calibrate PLC parameters (ρ threshold, reserve_pct) for mobile scenarios
Compare with LEO results
Expected Outcome:
Demonstrate PLC's applicability beyond LEO satellites, expanding its impact to billions of mobile users.
Timeline: 4-6 weeks
Technical Questions
Before proceeding, we would appreciate your insights on several technical aspects:
Q1: Phase-Aware BBR Integration
You mentioned the need for phase-aware integration to expose BBR's internal state machine (STARTUP, PROBE_BW, PROBE_RTT).
Question:
Have you explored standardizing this API? For example:
type BBRState interface {
    GetPhase() Phase  // STARTUP, PROBE_BW, PROBE_RTT, DRAIN
    GetPacingGain() float64
    GetCwndGain() float64
}
This would enable PLC to:
Disable damping during STARTUP
Boost intervention during PROBE_BW
Coordinate with PROBE_RTT
Our Contribution:
We can implement this in quic-go and propose it as a standard interface for congestion control wrappers.
Q2: CoDel Dependency in Production QUIC Stacks
Your results show CoDel is essential for safety (31,541ms → 119ms containment).
Question:
Most production QUIC stacks (Chromium, quic-go, LSQUIC) operate in user-space without kernel-level AQM. How would you recommend deploying PLC in these environments?
Options we're considering:
Application-layer AQM: Implement CoDel-like logic in QUIC stack
Gateway-side deployment: PLC + CoDel on network gateway
Hybrid approach: PLC in application, CoDel in gateway
Which approach do you recommend?
Q3: Multi-Flow Scalability
Your multi-flow tests used 2 flows with asymmetric start.
Question:
We routinely test with 4-16 concurrent connections. Are there known scalability limits or fairness issues at higher flow counts?
Specific concerns:
Does PLC maintain fairness under per-flow fair queueing (FQ)?
How does PLC behave when flows have different RTTs (e.g., 50ms vs 200ms)?
Is there a recommended maximum flow count per PLC instance?
Q4: FEC Integration Considerations
Question:
Have you considered integrating Forward Error Correction with PLC? Specifically:
Could PLC's handover detection trigger adaptive FEC overhead?
Would FEC's loss recovery reduce the need for aggressive rate reduction?
Are there potential conflicts between PLC's rate modulation and FEC's bandwidth overhead?
Q5: Real-World Deployment Experience
Question:
Have you had opportunities to test PLC on actual LEO terminals (Starlink, OneWeb, Kuiper)? We would be very interested in any field validation results or deployment challenges you've encountered.
What We Can Offer
1. Testing Infrastructure
quic-test platform: Ready for PLC integration
Automated test matrix: 1,260+ test scenarios
Real network traces: RU↔EU, RU↔US, mobile networks
Monitoring stack: Prometheus + Grafana
2. FEC Implementation
Source code: C++/AVX2 optimized
Performance: 80-90% recovery, 10-20% overhead
Integration: Ready for QUIC datagram layer
3. Mobile Network Access
4G/5G testbeds: Real handover traces
Wi-Fi ↔ LTE scenarios: Connection migration testing
Diverse conditions: Urban, suburban, rural
4. Publication Support
Technical blog: https://cloudbridge-research.ru (Russian tech community)
Habr.com: Large Russian developer audience (100k+ readers)
GitHub: Open-source release with full reproducibility
Proposed Pilot Project
To validate mutual interest and technical feasibility, we propose a small pilot:
Phase 1: Baseline Validation (Week 1-2)
Goal: Reproduce your BBRv1 results
Tasks:
Implement PLC wrapper in quic-test
Replicate your Heavy Load scenario (75 handovers/500s)
Validate key metrics:
Mean latency: 848ms → 544ms (target: -304ms)
p99 latency: 1154ms → 1109ms (target: -45ms)
Compliance: 0% → 8% (target: +8pp)
Deliverable: Technical report comparing our results with yours
Phase 2: BBRv2 Extension (Week 3-4)
Goal: Test PLC on BBRv2
Tasks:
Integrate PLC with BBRv2 implementation
Measure phase-dependent behavior (STARTUP vs PROBE_BW)
Quantify STARTUP conflict severity vs BBRv1
Deliverable: Comparative analysis (BBRv1 vs BBRv2)
Phase 3: Public Release (Week 5-6)
Goal: Share results with community
Tasks:
Prepare joint blog post (English + Russian)
Release PLC integration in quic-test (GitHub)
Publish results on cloudbridge-research.ru
Deliverable: Public announcement with full reproducibility
Total Duration: 6 weeks
Commitment: 20-30 hours/week from our team
Cost: None (open collaboration)
Next Steps
If you're interested in exploring this collaboration, we suggest:
Immediate (This Week)
Initial call: 30-minute video discussion to align on goals
Technical review: Share our quic-test architecture for your feedback
Scope agreement: Finalize pilot project scope
Short-term (Weeks 1-2)
PLC integration: Begin implementation in quic-test
Baseline validation: Reproduce your BBRv1 results
Progress updates: Weekly email summaries
Medium-term (Weeks 3-6)
BBRv2/v3 testing: Extended validation
FEC integration: If pilot succeeds
Public release: Joint announcement
About CloudBridge Research
We are a network protocol research group focused on:
QUIC optimization: BBRv2/v3, FEC, ACK-Frequency
Inter-regional connectivity: RU↔EU, RU↔US routes
Mobile networks: 4G/5G handover resilience
Open-source tools: quic-test, AI Routing Lab, MASQUE VPN
Team:
Network engineers with production QUIC deployment experience
Researchers with academic backgrounds in congestion control
Open-source contributors to quic-go ecosystem
Publications:
Habr.com (Russian tech blog): 4 articles on QUIC/BBR (10k+ views)
GitHub: 3 open-source projects (quic-test, ai-routing-lab, masque-vpn)
Research website: https://cloudbridge-research.ru
Contact:
Email: in...@cloudbridge-research.ru
GitHub: https://github.com/twogc
Website: https://cloudbridge-research.ru
Closing
Your PLC work represents a significant advancement in latency-sensitive congestion control. We believe our complementary expertise (BBRv2/v3, FEC, mobile networks) could extend PLC's impact beyond LEO satellites to terrestrial networks serving billions of users.
We would be honored to collaborate with you on validating and extending this important work. Please let us know if you're interested in discussing this further.
Looking forward to your response!
Best regards,
Maksim Lanies
References:
[1] quic-test: https://github.com/twogc/quic-test
[2] Experimental QUIC Laboratory Research Report (October 2025): https://cloudbridge-research.ru/research

Scenario	CUBIC	BBRv2	Improvement
Low RTT (5ms)	95.2 Mbps	98.1 Mbps	+3.0%
Medium RTT (50ms)	78.5 Mbps	112.3 Mbps	+43.1%
High RTT (200ms)	45.2 Mbps	89.7 Mbps	+98.5%

Reply all

Reply to author

Forward

0 new messages