19 Jan 2020
Last week I wrote about how capnproto-rust might relax its memory alignment requirements and what the performance cost of that might look like. The ensuing discussion taught me that memory alignment issues can be thornier than I had thought, and it strengthened my belief that capnproto-rust users ought be shielded from such issues. Since then, working with the helpful feedback of many people, I have implemented what I consider to be a satisfactory resolution to the problem. Today I’m releasing it as part of capnproto-rust version 0.12. The new version not only provides a safe interface for unaligned memory, but also maintains high performance for aligned memory.
Cargo supports a feature-flags mechanism, whereby a crate can declare parts of its functionality to be optional, with enablement or disablement happening at compile time.
As of version 0.12, the capnp
crate has a new
feature flag called unaligned
.
When unaligned
is enabled, capnp
makes no assumptions about the alignment of its data.
In particular, it can read a message in place from any array of bytes via
read_message_from_flat_slice()
.
On the flip side, when unaligned
is not enabled, capnp
requires that message segments are 8-byte aligned,
returning an error
if it detects that’s not the case.
The 8-byte alignment is then used whenever
capnp
loads or stores a primitive value in a message.
With the new interface, there is no longer a need for the problematic unsafe fn Word::bytes_to_words()
,
so that method no longer exists.
The downside of enabling the unaligned
feature is that some operations require
more instructions on certain compilation targets.
To better understand the performance cost,
I ran capnproto-rust’s
benchmark suite
on three different computers: my laptop (x86_64), an EC2 ARM64 instance (aarch64), and a Raspberry Pi Zero (armv6).
I compared three different capnproto-rust versions: 0.11, 0.12, and 0.12 with unaligned
.
As expected, on all of the computers
the 0.12 version without the unaligned
feature performed about the same version 0.11
(within measurement noise).
When I enabled the unaligned
feature, the only computer where there
was a noticeable performance impact was the Raspberry Pi,
where the benchmarks slowed down between 10 and 20 percent.
This also was within my expectations, though I had been hoping
it would be lower. (If the performance impact had been negligible,
I would likely not have bothered to make unaligned
an optional feature; instead
I would have made it the only supported mode.)
Following ralfj’s suggestion, I also performed some testing with miri to increase my confidence that there is no lurking undefined behavior. I added some tests that specifically force 1-byte alignment.
I was pleasantly surprised to learn how easy it is to run miri these days:
$ rustup component add miri
$ cargo miri test
I recommend that you try this on your own projects!