Assumed Audience: Hackers, Rustaceans, and anyone interested in programming language design.
Epistemic Status: Quite confident. The ideas are partially implemented and working.
This post is partially an ad!
Introduction
josephg
wrote a post about what he’d like to see in future Rust, or
another programming language following Rust.
Unbeknownst to josephg
, that language is coming into existence right now! It
is called Yao.
I already wrote about what else Rust got wrong, but let’s go over the
features josephg
wants and see how Yao stacks up.
Before I do, in order to avoid misunderstandings, I must admit that I am not completely certain that these ideas are possible. More on that later.
Function Traits (Effects)
I briefly mentioned function traits here, but I should expand on them.
First of all, function traits require Restricted Structured Concurrency (RSC), so if you’re looking for a catch, that is one of them.
The second catch is that every function trait needs to be restrictive. For
example, pure
is a restrictive trait because it restricts the function from
having side effects. On the other hand, async
is not restrictive; it
actually allows the function to do more.
Why does this matter? Because a function without pure
will always be able
to call a function with it, while a function without async
cannot just call
a function with it.
This is actually the root of the function color problem: async
is exactly
backwards! Instead of marking asynchronous functions, languages should have had
us mark synchronous functions, and then async
should have been the default!
Although that wouldn’t remove the possibility of bugs when calling a blocking
function in async
code, which is the real reason I hate async
.
Anyway, Yao’s function traits will always be restrictive, and that’s what will make them scale better than Rust’s while avoiding much of the problems because by default, functions will be able to do anything.
Global Context
But RSC gives so much more than just function traits: it can make it possible to safely handle global context.
An example: on both POSIX and Windows, there is a bit of global context called the “current working directory.” On all platforms, it’s global context, and it can be changed at any time. This can easily lead to problems, especially in multi-threaded code.
However, Yao code can have multiple current working directories. In fact, each thread can have its own!
To demonstrate this, you can do this:
$ git clone https://git.yzena.com/Yzena/Yc.git
$ cd Yc
$ make
<...>
BOOTSTRAP OK
$ ./release/yc yao tests/yao/cd.yao
/home/gavin/Downloads/Yc
test.txt DOES NOT EXIST
/home/gavin/Downloads/Yc/test1
test.txt DOES NOT EXIST
/home/gavin/Downloads/Yc/test1/test2
test.txt EXISTS
/home/gavin/Downloads/Yc/test1
test.txt DOES NOT EXIST
/home/gavin/Downloads/Yc
That clones this repo, bootstraps Yao and its build system, then runs Yao
on this file, which uses the pwd
command to demonstrate that the current
working directory is, in fact, being changed. Also, notice that the file
existence check is relative to the CWD; it only succeeds when the CWD is
test1/test2
, so path operations are relative to the CWD Yao sets.
And in none of this is Yao setting the global CWD; in fact, none of my code changes that global CWD. Yao is written to use its own CWD rather than the global CWD.
Now, this doesn’t demonstrate separate threads having the same capability, but that’s only because I haven’t implemented thread spawning in Yao yet. But the same capability exists in my C code, which can span threads.
This uses something called “context stacks,” a concept I stole from Jonathan Blow and his language, Jai. The “current working directory” is just the directory on the top of the CWD context stack for the thread.
But what if the thread doesn’t have one? Well, it is the directory that was on top of the parent thread’s CWD context stack when the child thread was created, and so on, recursively to the root thread.
And the root thread always has a CWD.
When combined with RSC, context stacks are enormously powerful and incredibly safe! I know that a CWD that is pushed onto a parent’s context stack before a child thread is created will always be on the stack for the entire lifetime of the child.
Another example: environments. The environ
variable, as well as getenv
and
setenv
, are the source of many footguns.
In Yao, environments are just another context stack:
$ ./release/yc yao tests/yao/env.yao
------------------
------------------
BC_ENV_ARGS=-l
BC_LINE_LENGTH=64
------------------
BC_ENV_ARGS=-l
++++++++++++++++++
BC_ENV_ARGS: -l
------------------
BC_ENV_ARGS=-l
BC_LINE_LENGTH=64
------------------
------------------
BC_ENV_ARGS=-l
BC_LINE_LENGTH=64
------------------
BC_ENV_ARGS: -l
BC_LINE_LENGTH: 64
That uses Yao to run this file, which uses env
to demonstrate that the
environment is changed for child processes. It also grabs an environment
variable with the equivalent of getenv()
(env.env
in the script) to
demonstrate that yes, you can still grab individual environment variables.
You can set whatever you want in the environment before you run that, and it doesn’t matter: the output will be exactly the same (barring any bugs).
Automatic Memory Management Without GC
Another thing that I think RSC gives us, when combined with scope-based resource management, is fully automatic memory management without a garbage collector.
Rust has already proven that this is possible, and I believe, but I’m not completely sure, that RSC would make it easier to reason about.
Regardless, Yao as currently implemented has no leaks. You can run Yao on both of those scripts above and under Valgrind to check:
$ ./release/yc rig -Dvalgrind=1
$ valgrind ./build/yc yao tests/yao/env.yao
$ valgrind ./build/yc yao tests/yao/cd.yao
Compile-Time Capabilities
Next, josephg
wants compile-time capabilities.
Yao already has them. In fact, it not only has them for functions, but it has them for keywords, which are Yao’s equivalent of macros.
To demonstrate this, run this:
$ ./release/yc yao tools/rig_slam.yao
Panic: Unimplemented
Source: /home/gavin/Downloads/Yc/src/yao/keywords.c:2105
Function: yao_parse_while()
Illegal instruction
That panic happens because it is trying to parse the while
loop in this
file, and I haven’t implemented parsing while
loops yet.
However, say you want to restrict a Yao script; you want to make sure it can’t be fully Turing-complete. And you don’t trust the author because he’s in the cubicle next to yours, and you know he’s dumb.
Yao has a builtin mode that makes the language non-Turing-complete, which
removes a lot of stuff. One of those things is while
, which makes things
Turing-complete.
So you could tell Yao to use the iterative
language mode, which enables
restricted loops, but not unrestricted loops like while
:
$ ./release/yc yao --lang-mode=iterative tools/rig_slam.yao
yc: tools/rig_slam.yao[146:2]
Parse error: Incomplete variable declaration for name: while
yc: tools/rig_slam.yao[146:8]
Invalid token: Expected semicolon (';')
Panic: Unimplemented
Source: /home/gavin/Downloads/Yc/src/yao/keywords.c:1753
Function: yao_parse_if()
Illegal instruction
Another unimplemented panic, but notice that it’s different: Yao tried to parse
while
as a plain name! It’s as though it didn’t even exist!
So yeah, Yao has compile-time capabilities right now! It does it by simply not even importing the definition of things that are restricted, which means that the restricted code can’t access it, no matter what.
It’s fine-grained too; while
is gone, but if
is not; that’s where the panic
happened! And yes, this shows that it applies to “builtin” keywords too, not
just user-defined ones. It can also apply to functions, packages, types, and
context stacks!
Although it is only implemented on keywords and packages at the moment.
Runtime Capabilities
But if Yao has compile-time capabilities on context stacks, it can prevent code from accessing certain context stacks. What if we used that to go a step further and use context stacks to implement runtime capabilities?
Take the CWD context stack for example. We could use something like it, along
with RESOLVE_BENEATH
on Linux and O_RESOLVE_BENEATH
on FreeBSD, to implement
filesystem capabilities at runtime.
What if your program foo
started with /
as an open directory on a context
stack? Then, your setup code could use that directory to openat()
on
/etc/foo
and /home/$USER/.config/foo
, and push both of those open
directories onto the context stack.
Then you call a dependency that you don’t trust; you didn’t give it access to
that directory context stack, so it can’t change them. Perhaps it does open
files, but since it has to use the standard library to do that, and the standard
library does have access to that context stack, it can read the top of the
context stack and try to open a file in either /etc/foo
or
/home/$USER/.config/foo
. Since it is using {O_}RESOLVE_BENEATH
, it can
only open files in those two locations.
This means that the untrusted dependency can only open files in the config
directories of your foo
program; it can’t steal your SSH keys or your crypto
wallet. And it certainly can’t encrypt your whole drive and demand a ransom.
And the best part? Because those directories are open directories, foo
avoids TOCTTOU bugs on directory operations.
These RSC- and context stack-based runtime capabilities are enormously powerful! You can imagine one for restricting what external commands a dependency can run (only a set of C compilers, for example), or what domain names can be resolved. Or even what IP addresses to connect to.
Distribution
Unfortunately, there is a cost to such awesomeness: Yao can’t be compiled in the traditional way.
Instead, Yao will be compiled to an LLVM-like IR (which already exists), and that’s how it will be distributed. Users will have to do the final compile step on their local machines.
However, there is a nice side effect: Yao will avoid the stupid-long link times that traditional object files suffer from.
Pin, Move, and Struct Borrows
I agree with josephg
that Pin is complicated.
Yao will have the same problems, but unlike Rust, there will be another way to
avoid Box
: since Yao uses RSC, anything allocated on the stack before child
threads are created can be safely passed to those child threads.
And lest you think that that could easily blow out the stack, Yao actually has a shadow stack on the heap that is capable of allocations of any size.
In like manner, struct fields can be borrowed by child threads with smaller lifetimes. Easy as pie.
Okay, not completely easy, but easier than Rust!
Comptime
I wish I had code to demonstrate this, but I haven’t gotten there yet.
But Yao will have comptime!
Unlike Zig, its comptime will not be lazy.
First, some background: one of my first design decisions is that Yao will use currying. It will look like this:
fn curried(n: num) -> (s: str) -> bool
{
// stuff
}
This is actually how Yao will implement closures because explicit is better than implicit:
n: num = 1;
curried(n);
Anyway, I have always loved the idea of Zig, but hated the execution! I have written Haskell, and I can think in lazy languages, but I don’t like it.
So what if I just used currying to have comptime? Comptime callable code can be
marked with a #
(like C’s preprocessor), so it would look like this:
fn comp#(n: num) -> (s: str) -> bool
{
// stuff
}
Then you can call comp#(n)
at compile time and get back a compile-time
constant function. This is how Yao will implement generic functions.
With the addition that comptime functions can take and return types, Yao will get generic types too:
fn vec#(t: type) -> type
{
return struct {
array: ^t;
len: usize;
};
}
I will also add pure
as a function trait for zero side effects, as well as
total
for functions that are not Turing-complete (guaranteed to halt). If
comptime functions are required to be pure
and total
, then even though Yao’s
type system will be powerful, it won’t be Turing-complete!
Conclusion
If all of that is sounding too good to be true, it kind of is.
See, while I have succeeded in implementing some of Yao’s design, I haven’t implemented all of it. So unlike the V creator, I’m going to be very honest, clear, and precise: I do not know if Yao’s design is even possible to implement!
I do have some evidence that it is possible, and I do have some demos, but I do not know for sure!
In addition, while there is some good about RSC and Yao, there is some bad and ugly to them as well.
If Yao works, will it even be worth it with those disadvantages?
Of course, I do think so, but every programmer needs to make that decision personally. It could be that most, or all, just don’t think Yao is worth it.
And I need to be fine with that.
Oh, and if you’re worried about the license on the repo, I expressly give any reader permission to use that code to run the demonstrations in this post. Also, I would like to remove the SSPL and move to an AGPL-like license, but I need lawyer money first.