The TCP Window Size Problem
You've just upgraded to a 1 Gbps WAN link connecting your offices. You run a file transfer and... it's only achieving 4 Mbps. What happened?
This frustrating scenario is one of the most common network performance issues, and it's almost always caused by TCP window size limitations. Understanding the Bandwidth-Delay Product (BDP) and how to tune TCP window sizes is essential for anyone managing WAN connections, cloud infrastructure, or distributed applications.
What is the Bandwidth-Delay Product?
The Bandwidth-Delay Product (BDP) is a fundamental networking concept that determines how much data must be "in flight" on a network at any given time to fully utilize available bandwidth.
The Formula
BDP = Bandwidth × Round-Trip Time (RTT)
Or more precisely (converting to bytes):
BDP (bytes) = (Bandwidth in bits/sec × RTT in seconds) / 8
Real-World Example
Let's calculate the BDP for a common scenario:
Network: 1 Gbps link with 100ms round-trip time
BDP = (1,000,000,000 bits/sec × 0.1 sec) / 8
BDP = 100,000,000 bits / 8
BDP = 12,500,000 bytes
BDP = 12.5 MB
What This Means
The BDP of 12.5 MB means that 12.5 MB of data must be transmitted before the first acknowledgment returns. If your TCP window size is smaller than this, you cannot fully utilize the available bandwidth.
With the default 64 KB window on this link:
Effective Throughput = (Window Size × 8) / RTT
Effective Throughput = (64 KB × 8) / 0.1 sec
Effective Throughput = 5,120 Kbps
Effective Throughput = 5.2 Mbps
You're only achieving 0.52% of your 1 Gbps bandwidth!
Why Default TCP Windows Don't Work for WAN
TCP was designed in the 1980s when networks were primarily LANs with low latency (1-10ms) and limited bandwidth. The original TCP specification limited the window size to 65,535 bytes (64 KB) because the window size field in the TCP header was only 16 bits.
LAN Performance (Low Latency)
On a LAN with 1ms RTT and 64 KB window:
Throughput = (64 KB × 8) / 0.001 sec = 512 Mbps
This is perfectly adequate for most LAN scenarios.
WAN Performance (High Latency)
The same 64 KB window on a WAN with 150ms RTT:
Throughput = (64 KB × 8) / 0.15 sec = 3.4 Mbps
This is why your gigabit WAN link feels like a dialup connection!
TCP Window Scaling (RFC 1323)
To solve the 64 KB window limitation, RFC 1323 introduced TCP window scaling, which allows window sizes up to 1 GB (2^30 bytes).
How Window Scaling Works
Window scaling multiplies the advertised window by a scale factor negotiated during the TCP handshake:
Actual Window Size = Advertised Window × 2^(scale factor)
Example: With a scale factor of 8:
Actual Window = 64 KB × 2^8 = 64 KB × 256 = 16 MB
Enabling Window Scaling
Window scaling must be enabled on both endpoints and is negotiated during the SYN handshake. Once a connection is established, the scale factor cannot change.
Linux (usually enabled by default):
# Check current setting
sysctl net.ipv4.tcp_window_scaling
# Enable window scaling
sudo sysctl -w net.ipv4.tcp_window_scaling=1
# Make permanent
echo "net.ipv4.tcp_window_scaling = 1" | sudo tee -a /etc/sysctl.conf
Windows (enabled by default in Windows Vista+):
# Check current setting
netsh interface tcp show global
# Enable window scaling
netsh interface tcp set global autotuninglevel=normal
# For high-throughput WAN, use experimental
netsh interface tcp set global autotuninglevel=experimental
macOS (enabled by default):
# Check current setting
sysctl net.inet.tcp.win_scale_factor
# Window scaling is always enabled on macOS
# The scale factor is automatically determined
Calculating Optimal TCP Window Size
To fully utilize your network, your TCP window size must be at least equal to the BDP.
Step 1: Measure Your Network
Bandwidth: Check with your ISP or cloud provider RTT: Measure with ping
ping -c 10 remote-host.com
Look for the average RTT in the summary.
Step 2: Calculate BDP
Use this formula:
BDP (MB) = (Bandwidth in Mbps × RTT in ms) / 8
Example calculations:
| Bandwidth | RTT | BDP | Required Window |
|---|---|---|---|
| 100 Mbps | 10ms | 125 KB | 128 KB |
| 100 Mbps | 50ms | 625 KB | 1 MB |
| 100 Mbps | 150ms | 1.875 MB | 2 MB |
| 1 Gbps | 50ms | 6.25 MB | 8 MB |
| 1 Gbps | 150ms | 18.75 MB | 20 MB |
| 10 Gbps | 10ms | 12.5 MB | 16 MB |
| 10 Gbps | 100ms | 125 MB | 128 MB |
Step 3: Set Socket Buffers
The TCP window size cannot exceed the socket buffer size. You must configure OS buffers to accommodate your BDP.
Linux - Set TCP buffer sizes:
# For 1 Gbps × 150ms (BDP = 18.75 MB, recommend 2× = 37.5 MB)
# Syntax: min default max (in bytes)
# rmem = receive buffer, wmem = send buffer
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 39321600"
sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 39321600"
# Make permanent
cat << EOF | sudo tee -a /etc/sysctl.conf
net.ipv4.tcp_rmem = 4096 87380 39321600
net.ipv4.tcp_wmem = 4096 87380 39321600
EOF
Windows - Enable auto-tuning:
# Windows auto-tuning handles buffer sizing dynamically
netsh interface tcp set global autotuninglevel=normal
# For extreme WAN scenarios (satellite, transcontinental)
netsh interface tcp set global autotuninglevel=experimental
# Increase maximum window size
netsh interface tcp set global maxsynretransmissions=2
macOS - Set socket buffer maximums:
# For 1 Gbps × 150ms scenario (37.5 MB buffers)
sudo sysctl -w net.inet.tcp.sendspace=39321600
sudo sysctl -w net.inet.tcp.recvspace=39321600
# Set maximum socket buffer size
sudo sysctl -w kern.ipc.maxsockbuf=41943040
# Make permanent
cat << EOF | sudo tee -a /etc/sysctl.conf
net.inet.tcp.sendspace=39321600
net.inet.tcp.recvspace=39321600
kern.ipc.maxsockbuf=41943040
EOF
Understanding Buffer Sizing
Setting the right buffer size is a balance between throughput and memory consumption.
Minimum Buffer Size = BDP
This is the absolute minimum needed to saturate the link. Any smaller and you'll waste bandwidth.
Recommended Buffer Size = 2× BDP
This provides headroom for:
- RTT variance (network conditions change)
- Traffic bursts
- Protocol overhead
- Multiple simultaneous connections
Maximum Useful Buffer Size = 4× BDP
Beyond this, you get diminishing returns and risk bufferbloat—excessive buffering that causes high latency under load.
Memory Considerations
Per-connection overhead: Large buffers consume memory
For a server handling 10,000 concurrent connections with 40 MB buffers:
Memory usage = 10,000 × (40 MB send + 40 MB receive)
Memory usage = 800 GB
This is why servers should:
- Use auto-tuning that only allocates large buffers when needed
- Set conservative maximum buffer sizes
- Monitor memory usage under load
TCP Congestion Control Algorithms
Buffer sizing alone isn't enough—you also need the right congestion control algorithm.
TCP Cubic (Default on Most Systems)
Best for: General-purpose networking, LANs, low-latency WANs
How it works: Adjusts window size based on packet loss using a cubic function
Limitations: Slow to ramp up on high-latency links, relies heavily on packet loss as a signal
TCP BBR (Bottleneck Bandwidth and RTT)
Best for: High-latency WANs (>50ms), lossy networks, satellite, cellular
How it works: Actively measures available bandwidth and RTT, doesn't rely on packet loss
Performance: Typically 2-10× faster than Cubic on high-latency or lossy links
Requirements: Linux kernel 4.9+ (not available on Windows/macOS)
Enable BBR on Linux:
# Check available congestion control algorithms
sysctl net.ipv4.tcp_available_congestion_control
# Enable BBR
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
# Also enable fq (fair queueing) packet scheduler
sudo sysctl -w net.core.default_qdisc=fq
# Make permanent
cat << EOF | sudo tee -a /etc/sysctl.conf
net.ipv4.tcp_congestion_control = bbr
net.core.default_qdisc = fq
EOF
Performance Comparison
Scenario: 1 Gbps link with 100ms RTT and 0.1% packet loss
| Algorithm | Throughput | Notes |
|---|---|---|
| Reno | ~150 Mbps | Legacy, very loss-sensitive |
| Cubic | ~450 Mbps | Default on most Linux distros |
| BBR | ~950 Mbps | Near-optimal bandwidth usage |
Avoiding Bufferbloat
Bufferbloat occurs when excessive buffering causes high latency under load. Your throughput is great, but latency spikes to 500ms+ during transfers.
Symptoms
- Good throughput but terrible latency during large transfers
- Ping times spike from 20ms to 500ms+ under load
- VoIP/video calls become unusable during file transfers
- Gaming performance degrades during downloads
Causes
- Buffers set to 10× BDP or larger
- No active queue management (AQM)
- Legacy congestion control (Reno, Tahoe)
Solutions
1. Right-size buffers to 2-3× BDP:
# Calculate your BDP first
# Then set buffers to 2× BDP, not 10× BDP
2. Use BBR congestion control:
BBR actively drains queues to measure true RTT, preventing buffer accumulation.
3. Enable fair queueing (fq):
sudo sysctl -w net.core.default_qdisc=fq
4. Monitor latency under load:
# Run ping while performing large transfers
ping remote-host.com
# Latency should remain close to baseline
# Spikes >2× baseline indicate bufferbloat
Real-World Scenarios
Transcontinental WAN (US East ↔ US West)
Network: 1 Gbps, 75ms RTT
BDP: (1000 Mbps × 75ms) / 8 = 9.375 MB
Configuration:
# Linux
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 19660800"
sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 19660800"
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
sudo sysctl -w net.core.default_qdisc=fq
Expected improvement: 10-20× faster than default settings
Satellite Internet (GEO)
Network: 25 Mbps, 550ms RTT (geostationary orbit)
BDP: (25 Mbps × 550ms) / 8 = 1.72 MB
Configuration:
# Linux - satellite requires aggressive tuning
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 3600000"
sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 3600000"
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
sudo sysctl -w net.core.default_qdisc=fq
# Also increase initial congestion window
sudo ip route change default via <gateway> initcwnd 32
Expected improvement: 20-25× faster (BBR handles high latency + packet loss well)
Datacenter East-West Traffic
Network: 10 Gbps, 2ms RTT
BDP: (10000 Mbps × 2ms) / 8 = 2.5 MB
Configuration:
# Linux
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 5242880"
sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 5242880"
sudo sysctl -w net.ipv4.tcp_congestion_control=cubic # Cubic fine for low-latency
Expected improvement: 5-10× faster with proper tuning
Cellular (4G/5G)
Network: 50 Mbps, 30-80ms RTT (variable)
BDP: (50 Mbps × 80ms) / 8 = 500 KB (use worst-case RTT)
Configuration:
# Linux - BBR handles variable latency well
sudo sysctl -w net.ipv4.tcp_rmem="4096 87380 1048576"
sudo sysctl -w net.ipv4.tcp_wmem="4096 87380 1048576"
sudo sysctl -w net.ipv4.tcp_congestion_control=bbr
sudo sysctl -w net.core.default_qdisc=fq
Expected improvement: 3-8× faster, more stable throughput
Verifying Your Tuning Worked
After applying tuning, verify the improvements:
1. Test Throughput with iperf3
Server:
iperf3 -s
Client:
# Test for 30 seconds
iperf3 -c server-ip -t 30
# Expected results:
# Before tuning: 5-20 Mbps on 1 Gbps link
# After tuning: 900-950 Mbps on 1 Gbps link
2. Check TCP Connection Info (Linux)
# Show TCP info for active connections
ss -ti | grep -E "wscale|sack|bbr|cubic"
# Look for:
# wscale: Window scaling enabled
# sack: Selective acknowledgment enabled
# cubic/bbr: Congestion control algorithm
Example output:
cubic wscale:7,7 rto:204 rtt:3.662/1.766 ato:40 mss:1448 pmtu:1500 rcvmss:1448
advmss:1448 cwnd:118 ssthresh:110 bytes_sent:1048576 bytes_acked:1048576
3. Verify System Settings
# Check all tuning parameters
sysctl net.ipv4.tcp_rmem
sysctl net.ipv4.tcp_wmem
sysctl net.ipv4.tcp_window_scaling
sysctl net.ipv4.tcp_congestion_control
sysctl net.core.default_qdisc
4. Monitor Under Load
# Run iperf3 while monitoring ping times
# Ping should stay close to baseline (not 10× higher)
ping -c 100 server-ip
# Check for packet loss
# Should be <1% in most scenarios
Common Pitfalls and Troubleshooting
Both Endpoints Must Be Tuned
If you tune only the client but not the server, downloads will be fast but uploads will be slow (or vice versa). Both ends must support window scaling and have adequate buffers.
Firewalls and Middleboxes
Some firewalls and middleboxes strip TCP options like window scaling. This breaks high-bandwidth connections.
Test: Use tcpdump/Wireshark to verify SYN packets include window scaling option:
sudo tcpdump -i any -nn 'tcp[tcpflags] & tcp-syn != 0' -A
Look for wscale in the options.
MTU/MSS Issues
If your MTU is smaller than 1500 (VPN, PPPoE), you need larger windows to compensate:
Optimal Window = BDP + (BDP × MTU_reduction_percentage)
Auto-Tuning Conflicts
On some systems, manual buffer settings conflict with auto-tuning. If performance is still poor:
Linux: Try disabling manual settings and let auto-tuning handle it:
sudo sysctl -w net.ipv4.tcp_moderate_rcvbuf=1
Windows: Ensure auto-tuning is enabled:
netsh interface tcp set global autotuninglevel=normal
Conclusion
Understanding and properly configuring TCP window sizes based on the Bandwidth-Delay Product is essential for achieving optimal network performance, especially on high-latency WAN links.
Key takeaways:
- Calculate your BDP: Bandwidth × RTT / 8
- Enable window scaling: Required for windows >64 KB
- Set buffers to 2× BDP: Balances throughput and memory usage
- Use BBR for WAN: Dramatically better than Cubic on high-latency links
- Avoid bufferbloat: Don't set buffers >4× BDP
- Tune both endpoints: Client and server must both be configured
- Verify with iperf3: Measure actual throughput improvements
For more detailed calculations and OS-specific tuning commands tailored to your exact network configuration, try our TCP Window Size Calculator. It provides BDP calculations, buffer recommendations, congestion control advice, and copy-paste commands for Linux, Windows, and macOS.