gcannon
gcannon is a custom HTTP load generator built specifically for HttpArena. It uses Linux’s io_uring interface for high-performance, zero-copy networking, ensuring the benchmarking tool itself never becomes the bottleneck.
Why a custom load generator?
Traditional HTTP benchmarking tools like wrk or ab use epoll or kqueue for I/O multiplexing. While effective, these tools can become the bottleneck when testing extremely fast servers (millions of requests per second). gcannon uses io_uring to push the client-side ceiling higher.
Architecture
gcannon spawns N worker threads, each managing its own:
- io_uring ring with
IORING_SETUP_SINGLE_ISSUERandIORING_SETUP_DEFER_TASKRUNfor minimal kernel overhead - Provided buffer ring for zero-copy multishot receives – the kernel writes directly into pre-registered buffers
- Connection pool – each thread manages
connections / threadsTCP connections
I/O flow
- Connect – async
io_uring_prep_connectfor non-blocking TCP setup - Send – pipelined requests are pre-built into a single buffer and sent with
io_uring_prep_send - Receive – multishot
io_uring_prep_recv_multishotwith provided buffers; a single SQE produces multiple CQEs as data arrives - Parse – a streaming HTTP response parser counts completed responses and extracts headers
- Refill – as responses complete, new requests are fired to maintain pipeline depth
CQE batching
gcannon processes completions in batches using io_uring_peek_batch_cqe. With DEFER_TASKRUN, SEND and RECV completions can arrive in the same batch, which requires careful handling to avoid pipeline stalls.
Request templates
gcannon supports two modes:
URL mode
When given a plain URL (e.g., http://host:8080/pipeline), gcannon generates a standard HTTP/1.1 GET request and replicates it N times for pipelining.
Raw template mode (--raw)
When given raw request files (e.g., --raw get.raw,post_cl.raw,post_chunked.raw), gcannon sends the exact bytes from each file. This enables:
- Mixed GET/POST workloads
- Requests with specific headers, body encodings, or query parameters
- Bit-perfect request reproduction
Templates are assigned round-robin to connections, so each connection sends one request type consistently.
Pipelining
With -p N, gcannon sends N requests in a single write operation. As responses arrive, it refills the pipeline proportionally – if 5 responses are received, 5 new requests are queued. This maintains steady pressure without overwhelming the server’s receive buffer.
Latency measurement
Latency is measured per-request from the moment the send SQE is prepared (not when the kernel completes the send) to when the corresponding response is fully parsed. This captures the true round-trip time as seen by the application.
For pipelined requests, each request in the batch gets its own send timestamp, and latencies are matched FIFO to response completions.
Command-line reference
Usage: gcannon <url> -c <conns> -t <threads> -d <duration>
[-p <pipeline>] [-r <req/conn>]
[-R|--raw file1,file2,...]| Flag | Description | Default |
|---|---|---|
-c | Total connections | required |
-t | Worker threads | required |
-d | Test duration (e.g., 5s, 30s) | required |
-p | Pipeline depth | 1 |
-r | Requests per connection (0 = unlimited) | 0 |
--raw | Comma-separated raw request template files | – |