27 Dec 2014
Rust’s mutable references provide exclusive access to writable locations in memory.
If we have x : &'a mut T
,
then we know that the referred-to T
cannot be read or modified except through dereferencing x
.
In other words, x
can have no aliases.
This guarantee is
crucial for memory safety,
as it implies that
any mutations we apply through
x
have no risk of invalidating references held
elsewhere in our program.
The Builder
types
of capnproto-rust
also need to provide an exclusivity guarantee.
Recall that if Foo
is a struct defined in a Cap’n Proto schema,
then a foo::Builder<'a>
provides access to a writable location
in arena-allocated memory that contains
a Foo
in Cap’n Proto format.
To protect access to that memory, a foo::Builder<'a>
ought to behave
as if it were a &'a mut Foo
,
even though the Foo
type
cannot directly exist in Rust
(because Cap’n Proto struct layout
differs from Rust struct layout).
So the question arises: how do we define custom mutable references?
As we’ll see, the easy part is ensuring exclusive access,
which can be achieved simply by not
implementing the Copy
trait for foo::Builder<'a>
.
The tricky part is making it ergonomic to reuse a reference,
something that built-in mutable references achieve through
special automatic reborrowing semantics. Our custom references
can use similar semantics, but they need to be slightly more explicit about it.
Okay, let’s get concrete. Suppose that Foo
is defined in a Cap’n Proto schema like this:
struct Foo {
x @0 : Float32;
blob @1 : Data;
}
When we call capnp compile -orust foo.capnp
, we get generated code
containing the following definitions:
mod foo {
pub struct Builder<'a> {...}
impl <'a> Builder<'a> {
pub fn get_x(self) -> f32 {...}
pub fn set_x(&mut self, value : f32) {...}
pub fn get_blob(self) -> ::capnp::data::Builder<'a> {...}
pub fn set_blob(&mut self, value : ::capnp::data::Reader) {...}
pub fn init_blob(self, length : u32) -> ::capnp::data::Builder<'a> {...}
...
}
...
}
You see here the usual accessor methods that allow us to
read and modify a Foo
.
Note that the get_
and init_
methods take a by-value self
parameter.
This ensures that at most one ::capnp::data::Builder
referring to the blob
field
can be obtained.
For example, if we call foo.init_blob()
then we cannot later call foo.get_blob()
,
because foo
moves into the first call
and cannot be used again.
As the ::capnp::data::Builder<'a>
type is in fact just a typedef for &'a mut [u8]
,
it should be extra clear here why exclusivity is important to maintain.
One thing we might do with these accessors is
initialize the Foo
and return a reference to its interior,
as does this function:
fn init_and_return_slice<'a>(foo : foo::Builder<'a>) -> &'a mut [u8] {
foo.init_blob(100).slice_mut(5, 10)
}
But what if we want to call this function and
then afterwards call set_x()
?
We might write something like this:
fn do_some_things_wrong<'a>(mut foo : foo::Builder<'a>) {
{
let slice = init_and_return_slice(foo);
slice[0] = 42;
}
foo.set_x(1.23);
}
but if we try to compile this function, we get the following typecheck error:
main.rs:19:9: 19:12 error: use of moved value: `foo`
main.rs:19 foo.set_x(1.23);
^~~
The same pass-by-move semantics that were essential to preventing
aliasing have now become a problem.
We would like to be
able to borrow foo
for just the inner block,
and then reuse it for the final line.
If foo
were a built-in mutable reference, such a reborrow
would take place automatically, and everything would just work.
Fortunately, we can make do with our custom mutable reference
if we use the following following function,
which is also included in the generated code:
mod foo {
...
impl <'a> Builder <'a> {
pub fn borrow<'b>(&'b mut self) -> Builder<'b> { ... }
}
...
}
Using this, we can write our function as follows, and it successfully typechecks.
fn do_some_things_right<'a>(mut foo : foo::Builder<'a>) {
{
let slice = init_and_return_slice(foo.borrow());
slice[0] = 42
}
foo.set_x(1.23);
}
So it appears that the main inconviences of using our custom mutable references
compared to built-in mutable references
is that we need to add some calls to .borrow()
and maybe add some mut
’s to some bindings.
In fact, it seems to me that it would be possible for Rust to support
a built-in Reborrow
trait that could eliminate even these
inconveniences.
Finally, in case you’re wondering why we prefer by-move self
over &mut self
in our generated accessor methods, suppose that we also define this type in our schema:
struct Bar {
oneFoo @0 : Foo;
}
Using by-move self
allows us to return references deep in the interior of a Bar
, like this:
fn init_field_and_return_slice<'a>(bar : bar::Builder<'a>) -> &'a mut [u8] {
bar.init_one_foo().init_blob(100).slice_mut(5, 10)
}
If init_blob()
instead took a &mut self
parameter, this function would fail to typecheck
because the foo::Builder
returned by bar.init_one_foo()
does not live long enough.
The borrow()
method has been renamed
to reborrow()
.