Performance Tuning

This guide covers the key tunables in zerg and how to adjust them for your workload.

Reactor Count

ReactorCount = Environment.ProcessorCount

Rule of thumb: Start with one reactor per CPU core. Each reactor runs a tight event loop on a dedicated thread, so more reactors than cores leads to context switching overhead.

For mixed workloads (some connections do async I/O like database calls), you may benefit from slightly more reactors than cores since handler tasks yield to the thread pool during awaits.

CQ Timeout

CqTimeout = 1_000_000  // nanoseconds (1 ms)

The CQ timeout controls how long the reactor sleeps when no completions are available.

Value	Tail Latency	CPU Usage
`100_000` (0.1 ms)	Very low	High (more frequent wakeups)
`1_000_000` (1 ms)	Low	Moderate (default)
`10_000_000` (10 ms)	Moderate	Low
`100_000_000` (100 ms)	High	Very low

For latency-sensitive servers, use 0.1-1 ms. For background or batch-oriented servers, 10-100 ms is fine.

The acceptor uses 100 ms by default since accept bursts are infrequent.

Buffer Ring Sizing

RecvBufferSize

RecvBufferSize = 32 * 1024  // 32 KB per buffer

Each kernel recv writes into one of these buffers. If a recv delivers more data than the buffer size, the kernel fills the buffer and the remainder arrives in the next CQE.

Workload	Recommended Size
Small messages (HTTP/1.1 requests)	4-8 KB
Mixed traffic	16-32 KB (default)
Large uploads/downloads	64-128 KB
Websockets with large frames	64+ KB

BufferRingEntries

BufferRingEntries = 16 * 1024  // 16384 buffers

Total receive buffers available to the kernel per reactor. Must be a power of two.

Size it based on:

MaxConnectionsPerReactor – at minimum, one buffer per active connection
Data holding time – if handlers hold buffers during async work, you need more
Burst capacity – buffers to absorb data bursts without stalling

Memory impact: BufferRingEntries * RecvBufferSize per reactor.

Ring Entries

RingEntries = 8192

The SQ/CQ size. This is the maximum number of in-flight I/O operations:

One multishot recv per active connection
One send per flushing connection
Cancel operations

Should be >= MaxConnectionsPerReactor to avoid running out of SQE slots.

Batch CQEs

BatchCqes = 4096

Maximum CQEs processed per loop iteration. Larger values improve throughput under load by amortizing loop overhead, but increase per-loop latency (time to service all CQEs before sleeping again).

For latency-sensitive applications, consider reducing this to 256-1024.

SQPOLL

SQPOLL mode creates a kernel thread that continuously polls the submission queue:

new ReactorConfig(
    RingFlags: ABI.IORING_SETUP_SQPOLL | ABI.IORING_SETUP_SQ_AFF | ABI.IORING_SETUP_SINGLE_ISSUER,
    SqCpuThread: 4,        // pin to CPU 4
    SqThreadIdleMs: 200     // sleep after 200ms idle
)

Benefits:

Eliminates the io_uring_enter() syscall for submissions
Reduced per-submission latency

Costs:

Dedicates one CPU core per reactor
Increased power consumption
Requires CAP_SYS_NICE or appropriate permissions in containers

When to enable:

You have spare cores (total cores > 2 * reactor count)
Submission latency is your bottleneck
You’re already saturating network bandwidth

When to avoid:

Core-constrained environments (containers, small VMs)
Acceptor ring (multishot accept generates CQEs from interrupts, not submissions)

DEFER_TASKRUN

RingFlags = ABI.IORING_SETUP_SINGLE_ISSUER | ABI.IORING_SETUP_DEFER_TASKRUN

This is the default for reactors. It tells the kernel to defer task_work (completion callbacks) until the next io_uring_enter() call, rather than running them in interrupt context.

Benefits:

Completions arrive at predictable points in the reactor loop
Reduces latency spikes from interrupt-context work
Better cache behavior

When to disable: Rarely. Only if you’re seeing issues with specific kernel versions.

Connection Limits

MaxConnectionsPerReactor = 8192

Upper bound on concurrent connections per reactor. This is a logical limit, not a hard allocation.

Scaling formula:

Total concurrent connections = ReactorCount * MaxConnectionsPerReactor
With defaults: 1 * 8192 = 8,192 connections
With 12 reactors: 12 * 8192 = 98,304 connections

Ensure MaxConnectionsPerReactor <= RingEntries to avoid SQE exhaustion.

Listen Backlog

Backlog = 65535

Kernel queue for pending connections (accepted by kernel but not yet accepted by userspace). 65535 is the Linux maximum. Reduce only if you want to reject connections under load.

Benchmarking Tips

Warm up – run at least 10 seconds of load before measuring
Pin cores – use CPU affinity to prevent migration
Disable turbo boost – for consistent results, disable CPU frequency scaling
Use wrk or h2load – standard HTTP benchmarking tools
Watch for kernel limits – check ulimit -n (file descriptor limit) and net.core.somaxconn
Profile with perf – perf top shows where CPU time is spent

System Tuning

# Increase file descriptor limit
ulimit -n 1000000

# Increase somaxconn (listen backlog limit)
sysctl -w net.core.somaxconn=65535

# Increase local port range (for clients)
sysctl -w net.ipv4.ip_local_port_range="1024 65535"

# Disable TCP timestamps (minor latency improvement)
sysctl -w net.ipv4.tcp_timestamps=0

Configuration Matrix

Scenario	ReactorCount	RecvBufferSize	BufferRingEntries	CqTimeout
Low-latency API	CPU count	4 KB	8192	100,000 ns
HTTP server	CPU count	32 KB	16384	1,000,000 ns
Proxy/gateway	CPU count	64 KB	32768	500,000 ns
File transfer	CPU count / 2	128 KB	4096	10,000,000 ns
IoT/many connections	CPU count	2 KB	65536	1,000,000 ns

Buffer Management