Architecture

One reactor per thread, typically one per core. Each owns an io_uring, a SO_REUSEPORT listener, a connection table, buffer rings, and a connection pool - and is the sole writer of all of them. Nothing is shared between reactors. This page is the system view; the core internals page goes inside the classes, type by type.

The loop

A reactor's life is a single loop: drain the cross-thread queues, enter the kernel once (io_uring_enter - submitting everything staged and waiting for at least one completion), then dispatch the whole completion batch.

while (true)
{
    // Work handed over by off-reactor handlers. Cheap when empty.
    DrainReturnQ();      // buffer returns
    DrainFlushQ();       // flushes
    DrainRecycleQ();     // connection teardowns
    DrainRemoteOps();    // client ops

    // One syscall per batch: submit everything staged, wait for >= 1 CQE.
    Ring.SubmitAndWait(1);

    // Read the CQ tail once, dispatch the whole batch, publish the head once.
    uint ready = Ring.CqReady();

    for (uint i = 0; i < ready; i++)
        Dispatch(in Ring.CqeAt(i));

    Ring.CqAdvance(ready);
}

Dispatching a completion frequently runs handler code - that's the inline-resume model below - so by the time the loop re-enters the kernel, the responses those completions triggered are already staged in the submission queue. One syscall carries the whole request/response batch.

Ring setup

SINGLE_ISSUER - only this thread submits, so the kernel skips SQ locking.
DEFER_TASKRUN - completion work runs batched inside enter instead of interrupting the thread as task-work.
NO_SQARRAY (kernel 6.6+) - drops the SQ indirection array, one store fewer per SQE. Setup falls back automatically on older kernels (EINVAL probe).
SQ-full handling - if the submission queue fills mid-batch, the reactor flushes it with a no-wait enter and continues; submission never blocks on completions.

Accept, recv, send

Multishot accept - armed once on the listener; every new connection is just a CQE. Accepted sockets get TCP_NODELAY (it doesn't reliably inherit from the listener).
Multishot recv + provided buffers - armed once per connection. The kernel picks a buffer from a pre-registered ring, fills it, and posts a CQE carrying the buffer id. Handlers consume slices zero-copy and return buffers when done.
Send with MSG_WAITALL - the kernel retries short sends internally, so a flush is one SQE and one CQE. A genuinely partial send (error paths) is resubmitted from the offset.

Two buffer-ring modes

Shared (default): one pool per reactor; every connection draws from it. One recv consumes one whole buffer regardless of size - elastic and simple, but small messages waste space. Incremental (IOU_PBUF_RING_INC, kernel 6.12+): a small ring per connection, and the kernel appends successive recvs into the same buffer until it fills. Dense packing and per-connection isolation, paid for with refcounted recycling - a buffer returns only when the handler has returned every slice and the kernel is done appending (F_BUF_MORE cleared) - plus a ring registration per connection (MaxConnections caps the buffer-group ids). The handler API is identical in both modes; ReturnBuffer(s) routes the right return path.

Completion routing: tags and generations

Every SQE carries its routing in user_data:

[63:56] kind     accept · recv · send · wake · client · cancel
[47:32] gen      the connection's generation at submit time
[31:0]  fd       (or the client-op slot)

Dispatch is an array index (connections[fd] - fds are small dense integers, so an array beats hashing) plus a generation check. The generation is what makes fd reuse safe: when a connection dies, its fd number is immediately reusable, and a straggler CQE from the old life would otherwise reach the new tenant. Stale generation → the CQE is dropped and its buffer returned. The same guard rides the flush queue and incremental buffer returns.

Teardown also submits an ASYNC_CANCEL for the connection's multishot recv (matched by exact user_data), so a dead connection can't keep consuming buffers or race the fd's next tenant. If a connection's recv queue overflows - the handler isn't draining - the reactor cancels and tears it down rather than leaving it zombied.

Client ops (kind = client) skip the connection table entirely: the low 32 bits index a slot table holding the submission's completion object.

Inline resume

Every awaitable - ReadAsync, FlushAsync, every client op - is backed by a reusable IValueTaskSource core with RunContinuationsAsynchronously = false. When the reactor dispatches a CQE and calls SetResult, the awaiting handler continues right there, on the reactor thread, inside the dispatch loop. Zero allocation per await: connections are pooled and their cores are reused, with the connection generation as the token so an awaiter from a previous pool life resolves to a closed result instead of the new tenant's state.

Leaving the reactor (and coming back)

Handlers may wander - await Task.Delay, any BCL async - and resume on the thread pool. Every reactor-touching operation checks the current thread: on the reactor it takes the direct path (write the SQE, touch the buf_ring); off it, the operation is queued - lock-free MPSC queues for buffer returns and flushes, a queue for client ops and recycles - and the reactor is woken through an eventfd registered as a multishot poll. The detour costs a queue hop and a syscall; the hop Playground mode runs every request through it, end to end.

Connection lifetime

A connection has exactly two owners: the reactor (recv side) and the handler. The refcount starts at 2 on accept; each owner releases once - the reactor on EOF/error, the handler via conn.DecRef() on exit (exactly once, in a finally). Whoever reaches zero hands the connection to the reactor for recycling: cancel the multishot recv, return leftover buffers, close the fd, bump the generation (invalidating stale awaiters and queued work), reset state, and push to the pool (capped by PoolMax; beyond it, native memory is freed). Pooled connections keep their slab and buffer-ring allocations across lives.

Wiring and services

Three seams connect an application: Handle (the per-connection loop), OnStart (runs on the reactor thread before serving - open ring-native clients here so they bind to this reactor's ring), and typed services (AddService<T> / GetService<T>) so one reactor can carry any number of clients. The engine never names a client type - see custom clients.

Configuration

Option	Default	Meaning
`ReactorCount`	12	reactors = threads; run one per core
`RingEntries`	8192	io_uring SQ/CQ depth
`DualStack`	false	bind listeners (TCP and UDP) as dual-stack IPv6
`RecvBufferSize`	32 KB	shared mode: bytes per recv buffer
`BufferRingEntries`	4096	shared mode: buffers per reactor (power of two)
`Incremental`	false	per-connection buffer rings (kernel 6.12+)
`MaxConnections`	4096	incremental: one buffer-group id per live connection
`ConnBufRingEntries`	16	incremental: buffers per connection ring
`IncRecvBufferSize`	4 KB	incremental: bytes per buffer (kernel appends into it)
`Tcp.Port`	8080	SO_REUSEPORT listener port (every reactor binds it)
`Tcp.ExtraPorts`	[]	additional listener ports; `conn.ListenerPort` says which one a connection used
`Tcp.ListenBacklog`	1024	accept-queue depth per listener
`Tcp.WriteSlabSize`	16 KB	per-connection write buffer
`Tcp.PoolMax`	1024	pooled connection objects per reactor (bounds native memory)
`Tcp.RecvQueueEntries`	64	per-connection slice queue depth; overflow closes the connection
`Udp.Ports`	[]	raw UDP sockets; datagrams reach `Reactor.OnDatagram`
`Udp.RecvSlots`	16	shared UDP provided-buffer ring depth (in-flight datagrams)
`Udp.Gro`	true	UDP_GRO: coalesce equal-size datagram bursts into one completion
`Quic`	null	enable the QUIC transport - see QUIC & HTTP/3

Core internals →