0.18 — lazy UTF-8 and no-alloc

04 Sep 2023

New release alert! Version 0.18 of capnproto-rust is now available on crates.io.

If you use capnproto-rust on data with the Text built-in type, then it’s likely that this release will require some updates to your code. But don’t worry — the changes are straightforward and they bring some important benefits.

lazy UTF-8 validation

Suppose we have the following struct defined in a Cap’n Proto schema:

struct Foo {
  oneText @0 :Text;
  anotherText @1 :Text;
}

Then, in Rust, these Text fields can be accessed through the text::Reader type:

let my_foo: foo::Reader = ...;
let one_text: capnp::text::Reader<'_> = my_foo.get_one_text()?;
let another_text: capnp::text::Reader<'_> = my_foo.get_another_text()?;

But what exactly is a text::Reader?

the old definition

In previous versions of capnproto-rust, the text::Reader type was an alias to Rust’s &str type:

pub mod text {
  type Reader<'a> = &'a str;
}

At first glance, this seems like a perfect fit. A Cap’n Proto Text value is required to contain valid UTF-8 data, just like a Rust &str, and a text::Reader is meant to represent a reference to that data.

However, in practice, there are some ways in which this representation falls short.

the new definition

To address the above-noted shortcomings, version 0.18 of capnproto-rust defines text::Reader like this:

pub mod text {
  /// Wrapper around utf-8 encoded text.
  /// This is defined as a tuple struct to allow pattern matching
  /// on it via byte literals (for example `text::Reader(b"hello")`).
  #[derive(Copy, Clone, PartialEq)]
  pub struct Reader<'a>(pub &'a [u8]);

  impl<'a> Reader<'a> {
    pub fn as_bytes(self) -> &'a [u8] { ... }
    pub fn to_str(self) -> Result<&'a str, Utf8Error> { ... }
    pub fn to_string(self) -> Result<String, Utf8Error> { ... }
  }

  impl<'a> From<&'a str> for Reader<'a> { ... }
  impl<'a> From<&'a [u8]> for Reader<'a> { ... }}
}

Now consumers can easily access the underlying data, via as_bytes(), and getting it as a &str or String just requires an extra to_str() or to_string() call.

When setting text fields in a message, you will now need to insert some .into() calls to convert from a str or String into a text::Reader, like this:

let name: &str = "alice";
let mut my_foo: foo::Builder = ...;
my_foo.set_one_text("hello world".into())?;
my_foo.set_another_text(format!("hello {name}")[..].into())?;

All this is admittedly more verbose than it was before, but it’s in keeping with the general spirit of capnproto-rust: we are willing to introduce some verbosity if that’s what it takes to model Cap’n Proto data in a satisfactory way.

no-alloc mode

Another new feature is no-alloc mode.

In version 0.13, capnproto-rust gained support for no_std environments. However, it still depended on the alloc crate, which can sometimes be a problem for microcontroller targets and kernel programming. (See this issue for some discussion.)

Starting with version 0.18, the capnp crate now has an alloc Cargo feature, which can be disabled to remove the alloc dependency.

A side benefit of this change is that now error handling in capnproto-rust is much less dependent on heap allocation, and so should have better performance and be more reliable.

-- posted by dwrensha

capnproto-rust on github
more posts