io_uring
io_uring is a Linux kernel interface for asynchronous I/O. zerg uses it as its sole I/O mechanism – there are no epoll, kqueue, or libuv fallbacks.
How io_uring Works
io_uring uses two lock-free ring buffers shared between userspace and the kernel. Your app writes SQEs (requests), the kernel writes CQEs (results). No syscall needed for submission.
SQE — Submission Queue Entry
opcode = what to do (recv, send, accept)fd = which socketuser_data = your 64-bit tag (returned in CQE)flags = BUFFER_SELECT, etc.
CQE — Completion Queue Entry
user_data = your original tag (identifies the op)res = result (bytes transferred, new fd, or -errno)flags = MORE, BUFFER (contains buffer_id)
Shared Memory
Both rings are mmap'd. The kernel and your app write to them directly. No copy,
no syscall for enqueue. Only io_uring_enter() needed to wake the kernel.
The I/O Lifecycle
Step through the exact sequence: your app queues an SQE, the kernel processes it, and you read the CQE result. All in shared memory.
// 1. Queue work (no syscall) io_uring_sqe* sqe = shim_get_sqe(ring); shim_prep_recv_multishot_select(sqe, fd, bgid, 0); shim_sqe_set_data64(sqe, PackUd(UdKind.Recv, fd));// 2. Submit + wait in ONE syscall shim_submit_and_wait_timeout(ring, &cqes, 1, &ts);
// 3. Batch-read completions (no syscall) int got = shim_peek_batch_cqe(ring, cqes, batchSize);
// 4. Process results for (int i = 0; i < got; i++) { UdKind kind = UdKindOf(shim_cqe_get_data64(cqes[i])); int res = cqes[i]->res; // dispatch… } shim_cq_advance(ring, (uint)got); // 5. Mark consumed
Multishot Operations
Traditional I/O: 1 SQE → 1 CQE. Multishot: 1 SQE → many CQEs. The kernel keeps producing completions until an error or you cancel.
Traditional (one-shot)
Multishot (zerg)
user_data Packing
Each SQE carries a 64-bit token so the completion handler knows what operation completed and on which socket.
Provided Buffer Ring
Instead of passing a buffer with each recv, you pre-register a pool. The kernel picks one, fills it, and tells you which ID it used. You return it when done.
Full zerg Flow
Walk through the complete lifecycle: client connects, data flows in, your app responds, buffers recycle. Step through each phase one at a time.
Features Used by zerg
Multishot Accept
A single SQE arms the kernel to produce one CQE per accepted connection indefinitely. The acceptor thread never re-arms. Each CQE contains the new client fd in cqe->res and IORING_CQE_F_MORE to indicate more will follow.
Multishot Recv + Buffer Selection
A single SQE arms recv for a connection. Each time data arrives, the kernel picks a buffer from the buf_ring, fills it, and produces a CQE with the buffer ID in the flags. Eliminates per-recv buffer allocation.
Buffer Rings (Provided Buffers)
Pre-allocated buffer pool registered with the kernel via shim_setup_buf_ring(). Buffers are added with buf_ring_add() and recycled after use. See Buffer Rings for the full lifecycle.
SINGLE_ISSUER
Tells the kernel only one thread submits to this ring. Skips SQ locking for better throughput. Matches zerg's model where each reactor is the sole submitter to its ring.
DEFER_TASKRUN
Defers kernel task_work until the next ring entry. Reduces latency spikes from interrupt-context work and makes completions arrive at predictable points for better async/await integration.
SQPOLL (Optional)
Creates a kernel thread polling the SQ continuously, eliminating the io_uring_enter() syscall. Trades a dedicated CPU core for the lowest possible submission latency.
Submit-and-Wait
zerg's reactor uses shim_submit_and_wait_timeout() — a single syscall that submits all pending SQEs AND waits for at least one CQE. One syscall instead of two.
CQE Batching
Instead of one CQE at a time, the reactor peeks a batch with shim_peek_batch_cqe() and processes all before advancing the CQ head. Amortizes the head update across completions.