Assumed Audience: Hackers, programmers, and code architects.

Epistemic Status: Confident.

Introduction

There is a man I admire. His name is Dr. David Chisnall.

Well, there was an article complaining about Semantic Versioning, and Dr. Chisnall decided to weigh in.

He started like this:

I should write a blog about this somewhere so I can cite it and stop repeating it…

I have been waiting two months for that blog post.

Well, I need that blog post right now for another one of mine, so I’m going to write the post for him.

With apologies to Dr. Chisnall, this post is his ideas in my own words.

There isn’t a single original thought in this blog post. Do go read his comment.

The Problem

Dr. Chisnall’s thesis is simple:

[T]he core problem with SemVer is that it is used to version implementations, not interfaces.

Now, if you’re smart, you probably understood what he meant right away.

Me? I’m dumb and had to read his comment 4 times to get it. So let me use more words to convince myself I’m on the right track.

SemVer only talks about “backwards compatibility,” and that can mean anything from “changing the name of the most used function” to “hey, your just-freed memory cannot be reused anymore.”

Or as XKCD puts it, “your spacebar heating is gone.”

XKCD: Worflow

In this world, every change has the potential to be backwards incompatible for someone. Even bug fixes.

Why is that? Because bug compatibility.

Or in other words, Hyrum’s Law:

With a sufficient number of users of an API,
it does not matter what you promise in the contract:
all observable behaviors of your system
will be depended on by somebody.

Basically, SemVer’s use of “backwards compatibility” means that the contract is based on it, and because the contract is based on it, users expect all observable behaviors to be preserved.

This is what Dr. Chisnall meant when he said,

There are more subtle problems that relate to how richer type systems interact with the guarantees in SemVer. For example, anything that does pattern matching on structural types makes adding or removing a feature a breaking change.

The well-read will recognize this as an instance of the Expression Problem.1

This also applies to things like structs in C as well: how big is it, does it have a certain function pointer, etc. It’s not just a problem for the highest-level languages.

And this is why he claimed that SemVer versions implementations: backwards compatibility is a property of implementations (like pattern matching changes!), not interfaces.

The Solution

So the solution is versioning interfaces, but what does that even mean?

It means attaching the contract to the interface instead.

“Yeah, Gavin, that is so much clearer.”

Okay, let me try with a concrete example.

Let’s say you have an implementation with an interface, and you are versioning the implementation, as usual.

You want to change the behavior in a breaking way. Maybe your API is just…the worst. Or maybe you have a bug in a protocol.

Well, you have one choice: change the behavior while using the same interface and bump the “major” version.

Okay, that was easy, but that was because you knew your change would break things.

What if you don’t know that? What if it’s a small bug fix?

In that case, you have two choices:

  1. Assume the worst and bump the major version.
  2. Assume the best and do not bump the major version.

Let’s say you’re careful and you go for the first option.

Cool, people know that there might be a problem. You were thinking about your users.

Some of those users check the change, do some testing, find no problems, and upgrade to the new version.

…and some do not; they have pinned your implementation to the previous major version, and they never receive your bug fix. You keep getting people reporting the bug over and over again because they don’t upgrade.

That’s the best case scenario!

And that is why a lot of developers bias toward new major versions for everything.

The worst is if you go for option number 2.

In that case, you might have had a good bet; your bug fix didn’t break anything and everyone is happy. Whew!

But in a Turing-complete world, that is not a guarantee. So let’s say you made a bad bet, and the bug fix did break something. What happens?

Well, if you’re lucky, someone will notice and tell you. In that case, you bump the major version.

Well, you could keep the old behavior, but that is usually not considered for bug fixes.

But then all of the problems I just mentioned about the first option happen! There are some users who don’t upgrade.

Even worse, they may have upgraded to the version with the bug fix, and their code might be broken without them knowing.

And that brings us to the worst case scenario: you break everyone’s code, and no one notices!

In implementation versioning, this is the standard state.

Why? Because you are using one global identifier to communicate everything about your implementation.

Trying to distill Turing-complete code into one identifier seems like madness.

So implementation versioning is one blob of code with one version.

On the other hand, interface versioning is one or more blobs of code, each with its own version.

It’s a subtle distinction, but a crucial one.

Let’s repeat the scenario above, but with interface versioning.

You made that bug fix, but didn’t bump the major version. It broke someone’s stuff, and they told you.

What do you do? You want to prevent breaking users’ code, but you also want to keep the bug fix.

Well, you copy that code blob into another code blob. Now you have a previous blob and a current blob.

You take the previous blob and revert the change and give it a new minor version. But the current blob keeps the fix, and you bump its major version.

So now you have a previous interface on a new minor version and a current interface on a new major version.

Then you ship both.

This means that users who bump minor versions will end up on a version without the fix, and their code won’t be broken. But users who put in work can still upgrade to the bug fix.

That is interface versioning.

“Yes, but Gavin, you’re just shipping two versions! That seems like more work!”

Au contraire! In the above example, previous is effectively frozen, so it isn’t any extra work.

I mean, you could provide some maintenance on previous…for a price.

That’s called an opportunity.

And in return, you get to worry less about what user code you might be breaking. This sounds perfect for a lazy programmer like me.

More Interface Versioning Advantages

Okay, let’s assume there are some not lazy programmers who like worrying.

Are there any reasons why they should choose to version interfaces over implementations?

Of course!

Graceful Deprecation

The first is graceful deprecations. In the words of Dr. Chisnall,

You cannot do graceful deprecation with SemVer. In a project with a good support cycle, you have three states for interfaces within an implementation:

  1. Supported.
  2. Present but deprecated.
  3. Gone.

Each release will cycle interfaces through this little state machine. You cannot express this if you’re using SemVer for the implementation. If your library supports an interface Foo, you have three versions in SemVer:

  • 1.0 - Foo is supported.
  • 1.1 - Foo is deprecated, Bar is supported.
  • 2.0 - Foo is gone, Bar is supported (hopefully not deprecated already)

1.1 to 2.0 is not a breaking change for anyone that moved from Foo to Bar, but there’s no way, if you are using SemVer for implementations to indicate this. You may even have more complicated things such as

  • 1.0 - Foo is supported.
  • 1.1 - Foo is supported but has some new features.
  • 1.2 - Foo is deprecated, Bar is supported.
  • 2.0 - Foo is gone, Bar is supported (hopefully not deprecated already)

Now moving from 1.1 to 2.0 is a breaking change for everyone, but moving from 1.2 to 2.0 is not for anyone who is heeding their deprecation warnings.

Having 1.1 to 2.0 be a breaking change, but not 1.2 to 2.0, is nasty.

However, when versioning interfaces, you can combine multiple interface versions into one SemVer version. Dr. Chisnall said,

The thing that you want is to use SemVer for interfaces, where each version of the implementation has a tuple of interface versions. Now the flow is easy:

  • {1.0} (Foo is supported)
  • {1.1} (Foo is supported and has new features)
  • {1.1, 2.0} (Foo is supported as is Bar)
  • {2.0} (Foo is gone, Bar remains)

Now, if your dependency resolution first says ‘I need 1.x’ then it will match the first three versions. When you get to the third, it will say ‘by the way, there’s a newer thing you might want to migrate to’. Then you update it to say 2.0 and it still works with the third one, but will allow you to move to the fourth.

In other words, multiple interface versions can coexist under one implementation version, so downstream users can more easily move to new versions.

You Can Still Version Implementations

Another quirk of versioning interfaces is that you can still version implementations!

How do you do this? Easy: just follow the example above and tie one implementation to one interface version.

This isn’t an all-or-nothing thing, either; it’s as granular as you want.

You can freeze one function in the previous interface version, and just have a completely new implementation of that function in the current version. You can do this for two, three, or 42 functions.

You can do this at the type level too!

Wholesale Upgrades Still Work

But perhaps you cannot have two versions of an API at once, like in this comment.

For example, a C project where symbols might clash because namespaces were tortured, symbol versioning was banished, and version macros were launched into the sun.

What? I already said I’m lazy.

In that case, your users have to choose whether to adopt a new API wholesale or stay on the old one.

Is versioning implementation better in that case?

Nope!

I mean, upstream can get it wrong and only offer one at a time.

But that is just implementation versioning in disguise.

Or they can be smarter and offer multiple at a time.

This preserves graceful deprecation for downstream users by giving them time to adapt while both are supported.

This is what I will do in my current project: there will be build options for the API version of each interface, and there will be preprocessor guards around specific code. You choose the API versions at build time, and that’s what you get.

And when I finally remove old API versions, I can make selecting those versions a hard build error.

But you will still have the option of graceful deprecation.

And most importantly, it will be easy for me to document the breaking changes.

Monorepo Versioning

My current project is a monorepo with multiple pieces of software.

Is it possible to version multiple pieces of software together with implementation versioning?

Theoretically yes, but you’re going to have a lot of breaking changes.

But when you version interfaces, it is much easier to have disparate parts of the repo under different versions.

But do have some way for users to figure out what interface versions exist. A JSON file should do.

Conclusion

It turns that “globals [versions] are bad,” and local versions are good.

Who knew?

Anyway, if you don’t use interface versioning, I encourage you to start; you can always split one code blob into many, no matter what state that blob is in.

Of course, you may have to figure out a new global versioning scheme, but hey, you can actually try CalVer!


  1. “The Expression Problem is a new name for an old problem. The goal is to define a datatype by cases, where one can add new cases to the datatype and new functions over the datatype, without recompiling existing code, and while retaining static type safety (e.g., no casts).” Philip Wadler, 12 November 1998 ↩︎