Please see the disclaimer.
Assumed Audience: Programmers and hackers. Discuss on Hacker News, but please don’t post on lobste.rs because I have been banned.
Epistemic Status: Only slightly confident, but becoming more confident over time. Will update if that changes.
tl;dr: Some complexity belongs in the language server, not the compiler. Also, Yao’s model requires it for ease-of-use.
I’m building a programming language called Yao. Why? Because I hate every other language besides C, and C is not memory safe.
Yao is designed to be the language that fits my brain, but it is also designed to be a language that anyone could use for anything. I’m going to use it to replace all of my personal shell scripts and Python scripts. I’m also going to rewrite all of my current project, a C project, in it.
Yao has been in design for just over 10 years now.
And I’m only just starting; the first program has been written, but the compiler cannot parse it yet.
This should mean that Yao has a great design, right? Only time will tell, but I personally think so. And that’s all that matters because I am the target user.
But at the same time, I do want others to use it, so here’s why Yao has gone backwards and adopted forward declarations.
It seems that forward declarations are despised.
Though not universally.
This is not comforting.
Nevertheless, I agree with the philosophy of information accumulation, for which forward declarations are almost always required.
There are many advantages to this:
- It makes the language design simpler.
- It also makes it easier to make files in the program not depend on each other.
- Compilers can be single-pass.
- This makes the compiler simpler.
- It also makes it faster.
- It also makes it easier to reason about and change.
So why are they despised? I can only think of one reason:
- People don’t like having to update two places for every function/type.
One disadvantage against four; the decision to have forward declarations seems obvious.
The one disadvantage would have much greater weight, however, if the compiler didn’t help you fix the problem.
For example, if the compiler didn’t throw an error if the forward declaration and definition didn’t match, or didn’t tell you the location of all relevant declarations/definitions.
But every compiler that I know of does that, so the problem for people seems to be that it adds extra work.
When I realized that, I wondered if I could take away that extra work.
And I can!
Language Server Protocol
One of the few good things that Microsoft has given us is the Language Server Protocol (LSP), an IPC protocol to turn text editors into IDE’s and to add support for specific languages to IDE’s without custom work.
I was first made aware of this after watching an interview with Anders Hejlsberg on Modern Compiler Construction, where he claimed (rightly!) that compilers are taught wrong nowadays. He said that compilers are different; they have to do all sorts of things, like return error lists or provide suggestions.
He’s talking about the services a modern IDE provides, of course.
I saw this talk soon after it came out. At the time, I had already been designing Yao for three years, so I knew that Anders was right; my compiler would have to be able to do what he talked about.
I admit that it was discouraging.
In that interview, Anders never says the phrase “Language Server Protocol,” but he does talk about it, even using the word “protocol” and mentioning JSON. So after that interview, I went down the rabbit hole and found the Language Server Protocol.
Back then, it was sparse. Now, it is big enough that LSP is enough to turn a plain text editor, like Neovim, into a full IDE.
Automatic Forward Declaration Management
A crucial LSP feature that didn’t then exist is the ability of the language server (an IPC server that wraps the compiler) to apply edits to a file.
This is useful in so many ways, such as renaming all calls of a renamed function, removing a removed function argument, etc.
I realized it could also be used to add, change, or remove forward declarations.
Say you are working on a Yao program in an IDE. You add a function below another function that calls it. The error that the function does not exist at the call site does not change, but the language server can use extra smarts to realize that the function does exist, just later, so it can helpfully suggest adding it, which can be done with a simple press of the Tab key.
In fact, I plan on allowing users to configure the language server to have the IDE add, change, and remove them automatically. In that case, no extra work needed, not even one key press.
If I do this right, then the one disadvantage should disappear. Unless people are unreasonable, that should make their complaints about Yao’s forward declarations disappear as well.
Necessity of Forward Declarations
Of course, like C, forward declarations won’t be necessary in all cases.
It will only be necessary for mutually recursive functions or for mutually recursive struct types.
At least one of those types can only have a pointer to the others, just like C.
Speaking of C, part of the reason I chose to do this is because of the advantages that it gives C when providing a language server.
Anders says in the interview that compilers need a whole-project view, but this is only true in some languages.
In C, all you need to do is preprocess and parse the file itself. This is because the included headers should have all declarations necessary.
Yao will have somewhat of the same model, except that it will automatically generate the “headers” for you.
You import a package in a file, and Yao will read an “import” file that will have forward declarations for every public item in that imported package.
This does make a dependency where the imported package needs to be built first, but it also means that a single file can be parsed alone if that file was the only one that changed.
Anders talked about the architecture of the Roslyn C# compiler, and he said that it had to have the ability to restart computation anywhere.
For a company the size of Microsoft, it’s possible to create a compiler of that complexity, but Yao is just a side project for me; I have to simplify.
Of course, if my compiler were not fast by itself, I would have to do the same thing to avoid parsing the entire file.
But Yao is not C++ or Rust; I have a hard performance requirement of 1 million lines of code per second for one file.
At 1M LoC per second, I can parse 100,000 lines before hitting the 100 millisecond human limit.
And if you have a 100,000+ LoC file, you deserve the performance pain you feel.
So my compiler will reparse an entire file on every keypress. Of course, the language server will use tricks like keeping the information from import files around or cancelling parses when another keypress arrives, but other than that, it will just zip through.
The biggest reason I did this is because Yao’s model is different.
Most languages use a plain lexer and a plain parser. Yao does not.
Yao’s lexer and parser can both be changed on the fly.
For the lexer, users can add modes that change how text is lexed.
For the parser, users can add keywords with custom parsing code.
These two things together give Yao the same power as Lisp macros and allow users to grow the language in unforeseen ways.
This has already proven useful; Yao’s first program is written in a domain-specific language with custom keywords (for adding build targets and the like) and a special shell lexing mode.
But this also means that user code needs all of the knowledge of the code it can get. I can’t have multiple passes because requiring user code to handle multiple passes just won’t work. It’s got to be one pass, and that pass needs every bit of knowledge it can get about the code.
This includes full type checking, by the way. Yes, the parser does the type checking.
User code will be expected to do so, but in most cases, they’ll call the parser API that will do it for them.
An example of type checking that users must do is for the
target keyword in my
build DSL: the dependencies are a comma-separated list, and every item in the
list must be a string, so my DSL code does the type checking for that.
But again, the parser will provide an API to make that easy.
Without forward declarations, this would be impossible.
By the way, this model is another reason the compiler will reparse everything on every key press; it’s impossible to predict what user code will do.
Lest you think that user code upstream of a change can’t possibly change
anything downstream of where a change happens, think about
If you change the token after the closing brace of an
if statement from
else, the function to parse it suddenly claims more tokens after the
changed token and changes how they are interpreted.
Designing for LSP
I may not have intended to until recently, but my language has been designed to work well with LSP.
This has some advantages.
- I can design my compiler around LSP and do that right upfront, avoiding the
Rust problem where a separate analyzer (
rust-analyzer) is necessary for LSP.
- I can build LSP along with the compiler, giving me IDE tools early.
- I can leverage LSP for extra smarts that other languages probably would not have.
- I can avoid the Zig problem, where a custom IDE or super complex LSP server are needed, allowing users to use whatever they want.
- Finally, I can use LSP to solve some problems, like forward declarations, thus moving some complexity from the compiler to the language server so that the compiler doesn’t become overly complex.
That last point is important: good software design includes putting whatever complexity must exist where it best fits.
In Yao’s case, this means putting some user convenience code in the language server.
Yes, not requiring forward declarations is purely for user convenience.
Of course, the compiler can help where it makes sense (providing an error code to the language server that can tell the language server that a particular error might be a forward declaration error).
Yes, I know that this decision goes against 50 years of language design.
But just claiming that 50 years of language design say otherwise is not entirely honest; there are other variables at play. Things exist that didn’t back then.
LSP is one of those variables.
Not only that, but making this decision with an eye on LSP would be stupid if LSP were not winning in the marketplace.
But LSP does exist, and it is winning. I feel comfortable that it, or something like it, will always exist in the future, so I feel comfortable betting this design decision on it.
If there’s any conclusion to this post, it’s this:
Design should never happen in a vacuum.