28 May 2016
Until recently, the concept of generators, or resumable functions, seemed to me like a cute idea with only niche use cases. Sure, I had heard that generators in python and javascript could make certain things much nicer, but how often does one really need to have what appears to be nothing more than a fancy iterator? It wasn’t until I followed this Rust RFC thread that the true potential of generators in Rust started to dawn on me. Today they are my personal number-one most-desired new feature for the language, mostly because I believe they are the biggest missing piece in Rust’s async story.
But how are generators relevant to async at all?
To understand that, let’s first imagine what generators might look like in Rust. We start with a trait definition (borrowed from eddyb’s comment in the above-linked thread):
So a generator looks a lot like an iterator. You can ask it for the next value
until you reach its end, signalled by Return
.
The interesting part is that there is a special way to construct
values that implement Generator
. For example:
Here I’ve used some strawman syntax fn foo() -<T>-> R
, which denotes that foo
is a generator function that produces some number of T
values
and then finishes by producing an R
.
The yield
keyword inside of a generator function means that a Yield::Value()
should be produced
and the generator should be paused.
When you call fib_with_sum()
, the value you get back is a generator,
which can be used by calling next()
, like this:
Another thing that we might want to do is to
have one generator delegate to another generator.
The yield from
construction allows that:
Running this gives:
Note that when the sub-generator is done,
its return value gets plugged in at the yield from
expression of the
calling generator. So gen()
continues until j == 3
.
Now for the punchline. Using generators, we can define asynchronous reader and writer traits like this:
Then, at the top level of our program, we have a task executor, where
The task executor owns a collection of tasks and is responsible
for running them when they are ready.
When a task runs and needs to pend on some I/O, it yields
back what it’s currently waiting on, for example a file descriptor.
When none of the tasks can make any progress, the executor calls
a OS-specific API like kevent()
to wait until more progress can be made.
Unlike with fibers/green threading, it remains very clear where switches-between-tasks can take place. Unlike with promises, we don’t have to be constantly allocating closures on the heap. Generator-based async-I/O seems like an all-around win!