The Sharpest Fact

The new variable-length integer encoding, bijou64, has been found to significantly outperform the widely used LEB128 encoding in terms of speed while offering improved security. Bijou64 eliminates the possibility of multiple encodings for the same integer.

### LEB128 Limitations

Many binary protocols need compact integer encoding for small but occasionally large numbers. Variable-length integer encodings, or varints, solve this issue, but most designs treat canonicality as an afterthought, something enforced by a runtime check in the decoder rather than by the structure of the encoding itself. The most common varint, LEB128, encodes a number as a sequence of 7-bit segments with the high bit of each byte signaling “more bytes follow.” However, this results in multiple encodings for the same integer, which can cause problems for signed data.

### Bijou64 Solution

Bijou64 uses two tricks to eliminate the possibility of multiple encodings for the same integer. The first byte represents 0–247 as normal, but if it gets 0x42, it just decodes to 0x42. For numbers 248–255, the first byte is a tag for how many bytes to expect after this one, which will represent the number. This makes decoding really nice because the decoder knows how much memory to allocate as soon as it reads the first byte.

### Performance Comparison

Bijou64 was benchmarked against LEB128 on ARM (Apple M2 Pro) and x86 (AMD Zen 5) and was found to be 2-10 times faster. Small numbers, which encode to a single LEB128 byte, were about twice as fast in these benchmarks, while larger numbers, which force LEB128 to scan continuation bits across many bytes, were around 8–10 times faster.

### Benchmarks

The C benchmark was run on a uniform full-u64 distribution – about as adversarial as a benchmark gets. The results showed that bijou64 processed a batch of 4096 values in ~3 µs (≈0.75 ns per value), while LEB128 took ~30 µs (≈7.3 ns per value). The cumulative distribution functions underneath (CDFs) reveal the variance story. Bijou64’s CDFs are nearly vertical, while LEB128’s curves lean over and trail off to the right.

### Security Considerations

Decoding in bijou64 is canonical, meaning that it gets the security benefit for all but the largest numbers. In contrast, LEB128 has additional work in its canonicality check. Since there is no continuation-bit scanning, the bijou64 decoder knows immediately how many bytes to read; the encoder knows immediately how many to write.

### The Future of Bijou64

Bijou64 is not the most compact varint for every distribution, but it is a new and secure option that outperforms LEB128 in benchmarks. It eliminates the possibility of multiple encodings for the same integer, making it a better choice for applications where canonicality matters. The library is published as bijou64 on crates.io, dual MIT / Apache-2.0, and the spec is CC BY-SA 4.0.