Request for Review: Dual-Loop Predictive Latency Controller Tested over BBRv1

74 views
Skip to first unread message

Jin-Hyeong Lee

unread,
Dec 3, 2025, 8:13:13 AM (11 days ago) Dec 3
to BBR Development

Dear BBR Development Group,

I am writing to share a recent technical note detailing Proactive Latency Control (PLC), a lightweight dual-loop predictive latency controller designed as a supervisory wrapper over BBRv1.

This work addresses the predictable yet disruptive RTT inflation caused by handovers in Low Earth Orbit (LEO) satellite networks, which current delay-reactive controllers often misinterpret as congestion. I have only evaluated PLC on a BBR v1-like implementation, and I would greatly appreciate your feedback before extending the evaluation to BBR v2/v3.

Summary of PLC (Adaptive Supervisory Layer)

PLC is an O(1) dual-loop adaptation layer that operates strictly as a shim above unmodified BBR, without touching its internal state machine:

  1. Fast Loop (Anticipatory): Pre-intervention rate modulation based on RTT change, responding in $\le 1$ RTT.

  2. Slow Loop (Homeostatic): Adaptive self-damping (200–400 ms time scale) using a Lyapunov-inspired, bounded-log multiplicative gain update (K_p) for stability and oscillation suppression.

Figure1_dual_loop.png
Key Results & Integration Challenge (BBR v1-like)

Under Heavy Load LEO handover dynamics (up to 75 handovers per 500 s), PLC demonstrates significant stabilization:

  • Latency Stabilization: Reduced 700–1,200 ms bufferbloat oscillations to a 100–150 ms regime.

  • Metric Improvement: Mean latency reduced by 300–326 ms, and p99 by 21–45 ms.

  • Efficiency: Computational overhead remains minimal (~36 μs per 10 ms cycle).

The critical challenge identified is a catastrophic interaction during asymmetric STARTUP (leading to >31s latency spikes), requiring compulsory Active Queue Management (AQM) like CoDel as a failsafe. This is due to the incompatibility between PLC's rapid damping and BBR's exponential probing in that phase.

Request for Guidance on BBR v2/v3 Integration

Before investing time in BBR v2/v3 testing, I seek your expert guidance on the following:

  1. Predictive/Dual-Loop History: Have similar supervisory dual-loop or predictive wrappers been evaluated internally for BBR, particularly in overcoming RTT inflation events?

  2. Safe STARTUP Integration: Given the observed incompatibility, what is the recommended, phase-aware approach for safely integrating external rate damping during BBR's STARTUP phase? (e.g., completely disabling the wrapper until the first PROBE_BW transition).

  3. Pacing/Rate Modulation: Are there known issues or non-linear effects when applying an external multiplicative rate modulation (outside of BBR's internal control) on top of BBR's highly tuned pacing layer?

  4. Measurement Methodology (BBRv3): What are the best practices for handling RTT/bandwidth sampling windows (especially min_rtt aging) to accurately capture performance differences when introducing external control?

A PDF containing detailed methodology, formulas, and reproducibility details is attached.(DOI: 10.36227/techrxiv.176281113.30584908/v2)

Thank you very much for your time and for maintaining such a technically rigorous community.

Best regards,

Jin-Hyeong Lee

Lee_JH_2025_Medical_Decision_Inspired_PLC_v_2.pdf

Maksim Lanies

unread,
Dec 3, 2025, 10:35:15 AM (11 days ago) Dec 3
to Jin-Hyeong Lee, BBR Development

Dear Dr. Jin-Hyeong Lee,

Thank you for sharing your excellent work on Proactive Latency Control (PLC) for LEO satellite networks. We have thoroughly reviewed your TechRxiv publication (DOI: 10.36227/techrxiv.176281113.30584908/v2) and find your dual-loop architecture both elegant and highly relevant to our ongoing research.

We are CloudBridge Research, a network protocol research group based in Russia, focusing on QUIC optimization for inter-regional and mobile networks. Your PLC approach addresses a fundamental problem we have also encountered: BBR's misinterpretation of predictable RTT spikes as congestion.

Our Research Context

We have developed quic-test [1], an open-source QUIC testing platform with the following capabilities:

Technical Infrastructure

  • BBRv2/BBRv3 Implementation: Modified quic-go fork with experimental congestion control
  • Forward Error Correction (FEC): XOR/RS codes with AVX2 optimization
    • Recovery rate: 80-90% for single packet losses
    • Overhead: 10-20% additional bandwidth
  • Automated Test Matrix: 1,260 tests across RTT/load/connection scenarios
  • Network Emulation: Linux tc (netem) for realistic conditions
  • Monitoring Stack: Prometheus metrics + Grafana dashboards
  • Real-world Traces: Moscow ↔ Frankfurt (~50ms RTT), Moscow ↔ US (~150ms RTT)

Key Research Findings [2]

Our October 2025 laboratory research revealed:

ScenarioCUBICBBRv2Improvement
Low RTT (5ms)95.2 Mbps98.1 Mbps+3.0%
Medium RTT (50ms)78.5 Mbps112.3 Mbps+43.1%
High RTT (200ms)45.2 Mbps89.7 Mbps+98.5%

ACK-Frequency Optimization:

  • Optimal frequency: 2-3 (15-25% overhead reduction)
  • Latency improvement: 10-20%

FEC Performance:

  • Critical for mobile networks (4G/5G) with 5% packet loss
  • Minimal latency impact
  • Significant reliability improvement in high-loss scenarios

GitHub Repository: https://github.com/twogc/quic-test
Research Website: https://cloudbridge-research.ru/research


Why PLC Resonates with Our Work

Your PLC architecture addresses three challenges we face in our deployment scenarios:

1. Mobile Network Handovers (4G/5G)

While your focus is LEO satellites, we work extensively with mobile networks where handover patterns are remarkably similar:

  • Predictable timing: Cell tower transitions every 10-30 seconds
  • Transient RTT spikes: +50-200ms during handover
  • Burst packet loss: 5-15% during transition

Your RTT-based detection mechanism (ρ = RTT_short / RTT_long > 1.20) appears directly applicable to terrestrial mobile handovers.

2. Inter-Regional Links

Our RU↔EU routes experience:

  • Variable RTT (40-80ms) due to routing changes
  • Periodic congestion events (predictable patterns)
  • BBR's reactive behavior causing unnecessary rate reductions

PLC's anticipatory control could significantly improve stability.

3. Connection Migration (QUIC Feature)

QUIC supports connection migration (Wi-Fi ↔ LTE), which causes similar RTT inflation patterns to satellite handovers. PLC could enhance this critical feature.


Proposed Collaboration

We are interested in three research directions and would value your guidance:

Direction 1: PLC Validation on BBRv2/v3

Background:
Your PLC was tested on a BBRv1-like implementation. BBRv2 introduced several improvements:

  • Less aggressive STARTUP phase (refined pacing_gain schedule)
  • Improved RTT estimation (better filtering)
  • Enhanced PROBE_RTT mechanism

Research Question:
Does BBRv2's less aggressive STARTUP reduce the catastrophic failure scenario you observed (31,541ms latency spike during asymmetric initialization)?

Our Proposal:

  1. Integrate PLC into our quic-test platform
  2. Validate PLC behavior on BBRv1 (baseline reproduction)
  3. Test PLC on BBRv2 and BBRv3
  4. Measure phase-dependent behavior across all versions
  5. Quantify STARTUP conflict severity: BBRv1 vs BBRv2 vs BBRv3

Expected Outcome:
Determine if BBRv2/v3's architectural improvements naturally mitigate the phase-conflict issue, potentially reducing CoDel dependency.

Timeline: 2-3 weeks


Direction 2: PLC + FEC Integration

Hypothesis:
Combining PLC's anticipatory rate control with our FEC's loss recovery could provide synergistic benefits during handover events.

Proposed Mechanism:

Pre-handover (t-2s):
  PLC detects ρ > 1.20 → reserves capacity (12-22% reduction)
  
Handover start (t-0s):
  PLC reduces rate preemptively
  FEC enabled with 15% overhead
  
Handover peak (t+0.2s):
  FEC recovers 80-90% of burst losses
  PLC's pre-intervention band prevents queue buildup
  
Post-handover (t+1s):
  PLC adaptive gain restores rate
  FEC overhead reduced to 10%

Research Questions:

  1. Does FEC reduce the need for aggressive PLC rate reduction?
  2. Can PLC's handover detection trigger adaptive FEC overhead (10% → 20% during handover)?
  3. What is the combined improvement in jitter and packet loss?

Our Contribution:
We can provide our FEC implementation (C++/AVX2 optimized) for integration testing.

Timeline: 3-4 weeks


Direction 3: Mobile Network Extension

Research Question:
Can PLC's RTT-based detection generalize to terrestrial mobile handovers (4G/5G, Wi-Fi ↔ LTE)?

Proposed Validation:

  1. Collect real 4G/5G handover traces (we have access to mobile network testbeds)
  2. Characterize handover patterns (RTT spike magnitude, duration, frequency)
  3. Calibrate PLC parameters (ρ threshold, reserve_pct) for mobile scenarios
  4. Compare with LEO results

Expected Outcome:
Demonstrate PLC's applicability beyond LEO satellites, expanding its impact to billions of mobile users.

Timeline: 4-6 weeks


Technical Questions

Before proceeding, we would appreciate your insights on several technical aspects:

Q1: Phase-Aware BBR Integration

You mentioned the need for phase-aware integration to expose BBR's internal state machine (STARTUP, PROBE_BW, PROBE_RTT).

Question:
Have you explored standardizing this API? For example:

type BBRState interface {
    GetPhase() Phase  // STARTUP, PROBE_BW, PROBE_RTT, DRAIN
    GetPacingGain() float64
    GetCwndGain() float64
}

This would enable PLC to:

  • Disable damping during STARTUP
  • Boost intervention during PROBE_BW
  • Coordinate with PROBE_RTT

Our Contribution:
We can implement this in quic-go and propose it as a standard interface for congestion control wrappers.

Q2: CoDel Dependency in Production QUIC Stacks

Your results show CoDel is essential for safety (31,541ms → 119ms containment).

Question:
Most production QUIC stacks (Chromium, quic-go, LSQUIC) operate in user-space without kernel-level AQM. How would you recommend deploying PLC in these environments?

Options we're considering:

  1. Application-layer AQM: Implement CoDel-like logic in QUIC stack
  2. Gateway-side deployment: PLC + CoDel on network gateway
  3. Hybrid approach: PLC in application, CoDel in gateway

Which approach do you recommend?

Q3: Multi-Flow Scalability

Your multi-flow tests used 2 flows with asymmetric start.

Question:
We routinely test with 4-16 concurrent connections. Are there known scalability limits or fairness issues at higher flow counts?

Specific concerns:

  • Does PLC maintain fairness under per-flow fair queueing (FQ)?
  • How does PLC behave when flows have different RTTs (e.g., 50ms vs 200ms)?
  • Is there a recommended maximum flow count per PLC instance?

Q4: FEC Integration Considerations

Question:
Have you considered integrating Forward Error Correction with PLC? Specifically:

  1. Could PLC's handover detection trigger adaptive FEC overhead?
  2. Would FEC's loss recovery reduce the need for aggressive rate reduction?
  3. Are there potential conflicts between PLC's rate modulation and FEC's bandwidth overhead?

Q5: Real-World Deployment Experience

Question:
Have you had opportunities to test PLC on actual LEO terminals (Starlink, OneWeb, Kuiper)? We would be very interested in any field validation results or deployment challenges you've encountered.


What We Can Offer

1. Testing Infrastructure

  • quic-test platform: Ready for PLC integration
  • Automated test matrix: 1,260+ test scenarios
  • Real network traces: RU↔EU, RU↔US, mobile networks
  • Monitoring stack: Prometheus + Grafana

2. FEC Implementation

  • Source code: C++/AVX2 optimized
  • Performance: 80-90% recovery, 10-20% overhead
  • Integration: Ready for QUIC datagram layer

3. Mobile Network Access

  • 4G/5G testbeds: Real handover traces
  • Wi-Fi ↔ LTE scenarios: Connection migration testing
  • Diverse conditions: Urban, suburban, rural

4. Publication Support

  • Technical blog: https://cloudbridge-research.ru (Russian tech community)
  • Habr.com: Large Russian developer audience (100k+ readers)
  • GitHub: Open-source release with full reproducibility

Proposed Pilot Project

To validate mutual interest and technical feasibility, we propose a small pilot:

Phase 1: Baseline Validation (Week 1-2)

Goal: Reproduce your BBRv1 results

Tasks:

  1. Implement PLC wrapper in quic-test
  2. Replicate your Heavy Load scenario (75 handovers/500s)
  3. Validate key metrics:
    • Mean latency: 848ms → 544ms (target: -304ms)
    • p99 latency: 1154ms → 1109ms (target: -45ms)
    • Compliance: 0% → 8% (target: +8pp)

Deliverable: Technical report comparing our results with yours

Phase 2: BBRv2 Extension (Week 3-4)

Goal: Test PLC on BBRv2

Tasks:

  1. Integrate PLC with BBRv2 implementation
  2. Measure phase-dependent behavior (STARTUP vs PROBE_BW)
  3. Quantify STARTUP conflict severity vs BBRv1

Deliverable: Comparative analysis (BBRv1 vs BBRv2)

Phase 3: Public Release (Week 5-6)

Goal: Share results with community

Tasks:

  1. Prepare joint blog post (English + Russian)
  2. Release PLC integration in quic-test (GitHub)
  3. Publish results on cloudbridge-research.ru

Deliverable: Public announcement with full reproducibility

Total Duration: 6 weeks
Commitment: 20-30 hours/week from our team
Cost: None (open collaboration)


Next Steps

If you're interested in exploring this collaboration, we suggest:

Immediate (This Week)

  1. Initial call: 30-minute video discussion to align on goals
  2. Technical review: Share our quic-test architecture for your feedback
  3. Scope agreement: Finalize pilot project scope

Short-term (Weeks 1-2)

  1. PLC integration: Begin implementation in quic-test
  2. Baseline validation: Reproduce your BBRv1 results
  3. Progress updates: Weekly email summaries

Medium-term (Weeks 3-6)

  1. BBRv2/v3 testing: Extended validation
  2. FEC integration: If pilot succeeds
  3. Public release: Joint announcement

About CloudBridge Research

We are a network protocol research group focused on:

  • QUIC optimization: BBRv2/v3, FEC, ACK-Frequency
  • Inter-regional connectivity: RU↔EU, RU↔US routes
  • Mobile networks: 4G/5G handover resilience
  • Open-source tools: quic-test, AI Routing Lab, MASQUE VPN

Team:

  • Network engineers with production QUIC deployment experience
  • Researchers with academic backgrounds in congestion control
  • Open-source contributors to quic-go ecosystem

Publications:

  • Habr.com (Russian tech blog): 4 articles on QUIC/BBR (10k+ views)
  • GitHub: 3 open-source projects (quic-test, ai-routing-lab, masque-vpn)
  • Research website: https://cloudbridge-research.ru

Contact:


Closing

Your PLC work represents a significant advancement in latency-sensitive congestion control. We believe our complementary expertise (BBRv2/v3, FEC, mobile networks) could extend PLC's impact beyond LEO satellites to terrestrial networks serving billions of users.

We would be honored to collaborate with you on validating and extending this important work. Please let us know if you're interested in discussing this further.

Looking forward to your response!

Best regards,

Maksim Lanies


References:

[1] quic-test: https://github.com/twogc/quic-test
[2] Experimental QUIC Laboratory Research Report (October 2025): https://cloudbridge-research.ru/research

Reply all
Reply to author
Forward
0 new messages