CHUNK-BARE-SEMICOLON
| Test ID | SMUG-CHUNK-BARE-SEMICOLON |
| Category | Smuggling |
| RFC | RFC 9112 §7.1.1 |
| Requirement | MUST reject |
| Expected | 400 or close |
What it sends
Chunk size 5; with a semicolon but no extension name.
POST / HTTP/1.1\r\n
Host: localhost:8080\r\n
Transfer-Encoding: chunked\r\n
\r\n
5;\r\n
hello\r\n
0\r\n
\r\nThe chunk size line 5; has a semicolon but no extension name after it.
What the RFC says
chunk-ext = *( BWS “;” BWS chunk-ext-name [ BWS “=” BWS chunk-ext-val ] )
chunk-ext-name = token
— RFC 9112 §7.1.1
The grammar requires a chunk-ext-name (which is a token, i.e., one or more tchar characters) after each semicolon. A bare semicolon with no extension name does not match the production and is therefore invalid.
Why it matters
A bare semicolon can cause parser confusion about chunk boundaries. A lenient parser might skip the empty extension and parse the chunk normally, while a strict parser rejects the line. If these two parsers sit in sequence (front-end / back-end), they disagree on whether the message is valid, enabling request smuggling.
Deep Analysis
Relevant ABNF (RFC 9112 §7.1 and §7.1.1)
chunk = chunk-size [ chunk-ext ] CRLF
chunk-data CRLF
chunk-size = 1*HEXDIG
chunk-ext = *( BWS ";" BWS chunk-ext-name
[ BWS "=" BWS chunk-ext-val ] )
chunk-ext-name = token
chunk-ext-val = token / quoted-string
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+"
/ "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHARFC Evidence
RFC 9112 §7.1.1 defines the chunk extension grammar:
“chunk-ext = *( BWS ‘;’ BWS chunk-ext-name [ BWS ‘=’ BWS chunk-ext-val ] )”
This means after the semicolon delimiter and optional whitespace, a chunk-ext-name is mandatory. The chunk-ext-name is defined as token, which is 1*tchar – it requires at least one character.
RFC 9112 §7.1.1 also states:
“A recipient MUST ignore unrecognized chunk extensions.”
This applies to well-formed but unknown extensions, not to syntactically invalid ones like a bare semicolon with no name.
RFC 9112 §7.1.1 further notes:
“A server ought to limit the total length of chunk extensions received in a request to an amount reasonable for the services provided, in the same way that it applies length limitations and timeouts for other parts of a message, and generate an appropriate 4xx (Client Error) response if that amount is exceeded.”
Step-by-Step ABNF Violation
- The parser reads
chunk-sizeand gets5(valid HEXDIG). - The parser encounters
;– this starts achunk-extproduction. - Inside
chunk-ext, after the;and optional BWS, the parser expectschunk-ext-name, which istoken=1*tchar. - The next character is
\r(0x0D, start of CRLF). The character\ris not atchar– it is a control character. - A
tokenrequires at least onetchar. Zerotcharcharacters means thechunk-ext-nameproduction fails. - Since the
chunk-extproduction cannot be satisfied, the entirechunkproduction fails. The message is syntactically invalid.
Real-World Smuggling Scenario
A bare semicolon creates ambiguity in how parsers determine chunk boundaries:
Attack vector: A front-end proxy encounters 5;\r\n and strips the empty extension, forwarding it as 5\r\n followed by 5 bytes of chunk data. A back-end parser sees the raw 5;\r\n and either (a) rejects it, causing the connection to desynchronize, or (b) interprets the semicolon differently – some parsers treat the semicolon as the start of an extension and scan forward for the name, potentially consuming the CRLF and chunk data bytes as part of the extension name.
This type of chunk extension parsing ambiguity was documented in the PortSwigger research on HTTP request smuggling, where malformed chunk extensions caused front-end/back-end disagreements on message framing. CVE-2023-44487 and related HTTP/2-to-HTTP/1.1 downgrade issues demonstrated that chunk extension handling inconsistencies are a practical attack surface.