io_uring

io_uring is a Linux kernel interface for asynchronous I/O. zerg uses it as its sole I/O mechanism – there are no epoll, kqueue, or libuv fallbacks.

The Ring Model

How io_uring Works

io_uring uses two lock-free ring buffers shared between userspace and the kernel. Your app writes SQEs (requests), the kernel writes CQEs (results). No syscall needed for submission.

SQE — Submission Queue Entry

opcode = what to do (recv, send, accept)
fd = which socket
user_data = your 64-bit tag (returned in CQE)
flags = BUFFER_SELECT, etc.

CQE — Completion Queue Entry

user_data = your original tag (identifies the op)
res = result (bytes transferred, new fd, or -errno)
flags = MORE, BUFFER (contains buffer_id)

Shared Memory

Both rings are mmap'd. The kernel and your app write to them directly. No copy, no syscall for enqueue. Only io_uring_enter() needed to wake the kernel.

Interactive

The I/O Lifecycle

Step through the exact sequence: your app queues an SQE, the kernel processes it, and you read the CQE result. All in shared memory.

zerg — single syscall pattern

// 1. Queue work (no syscall)
io_uring_sqe* sqe = shim_get_sqe(ring);
shim_prep_recv_multishot_select(sqe, fd, bgid, 0);
shim_sqe_set_data64(sqe, PackUd(UdKind.Recv, fd));
// 2. Submit + wait in ONE syscall
shim_submit_and_wait_timeout(ring, &cqes, 1, &ts);
// 3. Batch-read completions (no syscall)
int got = shim_peek_batch_cqe(ring, cqes, batchSize);
// 4. Process results
for (int i = 0; i < got; i++) {
UdKind kind = UdKindOf(shim_cqe_get_data64(cqes[i]));
int res = cqes[i]->res;
// dispatch…
}
shim_cq_advance(ring, (uint)got); // 5. Mark consumed

Key Feature

Multishot Operations

Traditional I/O: 1 SQE → 1 CQE. Multishot: 1 SQE → many CQEs. The kernel keeps producing completions until an error or you cancel.

Traditional (one-shot)

Multishot (zerg)

user_data Packing

Each SQE carries a 64-bit token so the completion handler knows what operation completed and on which socket.

Zero Copy

Provided Buffer Ring

Instead of passing a buffer with each recv, you pre-register a pool. The kernel picks one, fills it, and tells you which ID it used. You return it when done.

End to End

Full zerg Flow

Walk through the complete lifecycle: client connects, data flows in, your app responds, buffers recycle. Step through each phase one at a time.

Features Used by zerg

Multishot Accept

A single SQE arms the kernel to produce one CQE per accepted connection indefinitely. The acceptor thread never re-arms. Each CQE contains the new client fd in cqe->res and IORING_CQE_F_MORE to indicate more will follow.

Multishot Recv + Buffer Selection

A single SQE arms recv for a connection. Each time data arrives, the kernel picks a buffer from the buf_ring, fills it, and produces a CQE with the buffer ID in the flags. Eliminates per-recv buffer allocation.

Buffer Rings (Provided Buffers)

Pre-allocated buffer pool registered with the kernel via shim_setup_buf_ring(). Buffers are added with buf_ring_add() and recycled after use. See Buffer Rings for the full lifecycle.

SINGLE_ISSUER

Tells the kernel only one thread submits to this ring. Skips SQ locking for better throughput. Matches zerg's model where each reactor is the sole submitter to its ring.

DEFER_TASKRUN

Defers kernel task_work until the next ring entry. Reduces latency spikes from interrupt-context work and makes completions arrive at predictable points for better async/await integration.

SQPOLL (Optional)

Creates a kernel thread polling the SQ continuously, eliminating the io_uring_enter() syscall. Trades a dedicated CPU core for the lowest possible submission latency.

Submit-and-Wait

zerg's reactor uses shim_submit_and_wait_timeout() — a single syscall that submits all pending SQEs AND waits for at least one CQE. One syscall instead of two.

CQE Batching

Instead of one CQE at a time, the reactor peeks a batch with shim_peek_batch_cqe() and processes all before advancing the CQ head. Amortizes the head update across completions.

Reactor Pattern Buffer Rings