The way we're thinking about breaking changes is really silly
20 points by kylewlacy
20 points by kylewlacy
In practice, what this means is that we essentially don’t directly allow changing a function’s type ever.
As someone who has written and interacted with way too many APIs over the years: This is a feature, not a bug. I am happy that this is the case.
Unfortunately, union types break parametricity and have pretty poor nesting behavior
This is only the case for TypeScript, because it automatically flattens null
. Languages with a more rigorous type system, like Haskell or Rust, are not subject to this problem. For example, in Rust you can nest Option
and then use .flatten()
explicitly if you want to get rid of the option nesting.
So how do we prevent old call sites from breaking without having to keep supporting the exact way they were called into all eternity? Migrations!
Exactly. And the migration is that you change the name of the function and deprecate the old one, ideally with a generous timeline to make it easier for everyone to deal with this change - which might in practice be automated indeed. I don’t think I would look forward to a future where I have to trawl through migration macros and migration files to understand what my code does. But I do agree with the general sentiment that we could be more rigorous and automate more.
Unfortunately, union types break parametricity and have pretty poor nesting behavior
This is only the case for TypeScript, because it automatically flattens null. Languages with a more rigorous type system, like Haskell or Rust, are not subject to this problem.
It isn’t specific to Typescript and it isn’t a matter of being less rigorous, it’s the difference between union types and sum types. Typescript is one of the few languages with proper union types, because they are useful for compatibility with Javascript, but they have downsides. The flattening isn’t specific to null, it happens for all types in a union whether explicit or inside a type variable, which is what breaks parametricity.
Something similar happens with any dynamic language. Someone argued quite convincingly that GNUstep should do an SONAME version bump on every release that added any methods to a class because any code that used -respondsToSelector:
with the selector of the new method would have its behaviour changed.
Structural types have the same problem. Adding a method makes pattern matches on types succeed where they would have previously failed.
The key requirement for this to be a problem is that changes to the type are observable in control flow.
The problem with implicit migration is that the new version exists for a reason. If you could mechanically rewrite code from the old version to the new, don’t make the change. Some people might be detecting the method because they actually wanted an exhaustive match on types and were using it as a tag. Some might be doing the check because they want to use the new method if it’s added. If you don’t know why people are relying on specific behaviour in your code, you can’t change their code mechanically when you change that behaviour.
Unfortunately, union types break parametricity and have pretty poor nesting behavior This is only the case for TypeScript, because it automatically flattens null. Languages with a more rigorous type system, like Haskell or Rust, are not subject to this problem.
Rust doesn’t HAVE union types. You can write a function in Rust that takes an i64
. You can write a function in Rust that takes a Maybe<i64>
, which is either Some(i64)
or Nothing
. But you cannot write a function in Rust that takes an i64
OR a Nothing
. Imagine that you could. Imagine in Rust we were allowed to declare EITHER
fn f(x: Maybe) { … }
or
fn f(x: i64 | Nothing) { … }
The second one would be convenient when we change code. Suppose in V1.0 we have fn f(x: i64) { … } … let y:i64 = 36; return f(y);
Then in V2.0 we decide that f()
needs to change so that x can optionally be Nothing
. In current Rust we must do this:
fn f(x: Maybe) { … }
…
let y: Maybe = Some(36);
return f(y);
…which required us to change the place where f()
was USED as well as changing where it was declared. If Rust supported union types then we could change ONLY the declaration site, like this:
fn f(x: i64 | Nothing) { … }
…
let y:i64 = 36;
return f(y);
That’s pretty convenient! And Rust can’t do it today. But, as the article explains, it is possible to “nest” Maybe
types, to create something like Maybe<Maybe<i64>>
, and the same thing cannot be done for union types: i64 | Nothing | Nothing
is just repeating itself.
You can write a function in Rust that takes a Maybe, which is either Some(i64) or Nothing.
Tiny note, Maybe
is Option
in Rust, with Some
and None
.
I think it’s a great idea to formalize such migrations, but I don’t think they have any business being handled by the compiler, at best it could be handled by the package manager.
In any case, it sounds very challenging to implement in practice, especially if you involve inferred types and resolved symbols, because then it means you have to compile both versions of the code and compare some compiler-generated metadata across the two versions. So imagine how hairy it gets if you have a project with 400 dependencies and you try to update them after a few months, there will probably be hundreds of such migrations spread across various versions of all these dependencies. How do you then step through them? Maybe iteratively bump the package versions, recompile everything and perform one migration step at a time? But then how do you deal with compilation errors? I.e. I’m upgrading from some-dep-v.1.1
to some-dep-v1.5
, do I now have to deal with all the compilation issues with all the intermediate versions? (And the way those versions interact with all the other dependencies at some intermediate versions?). Now add to this the need to upgrade the compiler maybe even the package manager itself, and you have a very hairy problem.
Maybe a much simpler version of this great idea could take us 90% of the way. Like when you perform a dependency update, your package manager keeps a record of what the dependency versions were before the upgrade, so that when you recompile and get compilation errors, it can go check if any of the functions around the compilation error has any “breaking change notice” attached to it and display it to you as part of the error message, so that you or your AI assistant have better context.
Really good point. This feels like a similar problem that Microsoft had in the 2010’s for C#: they had a compiler that had full analysis of a program internally, but externally it was a black box that gave you a binary out at the end… and they also had an IDE that needed deep knowledge of inferred types, extension methods, etc. That led to Roslyn, where the core of the compiler could act as a standalone API which Visual Studio could then leverage. Roslyn was the first domino that led to the modern Language Server Protocol today
I agree that a “code migration” feature lives in the domain of a package manager, but like you mentioned, it’d be impossible to handle every edge case without deep program analysis. It would either require a compiler that exposes a full API (like C#’s Roslyn or TypeScript), or a standardized protocol like we got with LSPs that compilers would implement…
I think you’re right about keeping it simple too. This looks like an 80/20 problem, where you could get 80% of the value by only doing 20% of the work by ignoring the hairy parts
In PHP we have something that resembles what OP is describing: https://github.com/rectorphp/rector
Rector allwos to write migrations that applies to your code: language version update, framework update, functions signature changes, etc.
This sounds like the evergreen migrations that Lamdera does https://dashboard.lamdera.app/docs/evergreen.
Solving this with migrations doesn’t feel great. You’ve gone from “A guarantees what B requires” to “A guarantees more than what B requires”. These are both morally fine and both versions should be compatible at the same time.
In, say, Java, you’d just make the original signature an overload that calls the new one, understanding that it’ll live a very long time if not forever. I don’t think there’s anything wrong with that.
My first thought was “great idea”. But then I realized that that means that my code is automatically modified by code I downloaded, which I don’t like. The thought of going through diffs after a package update, to check the migrations … The big difference to DBs is: you write your migrations for your databases, not for someone else’s, that makes it magnitudes harder. Also, sometimes you need to downgrade packages, sadly, so the migrations would need to be reversible.
I like the idea but I think people are going to get hung up on what the specific workflow should be. I think ideally you run a command to ask the compiler to apply the macros to transform the source and then commit the new version. Alternatively you could label migrations as being between particular versions, and have imports specify a version, but over time this will increasingly obfuscate things for readers because more and more macros will be implicitly applied to the code they think they are reading, and this is particularly dangerous if there are side effects.
I use a similar pattern for wildcard imports in my python code today, I define foolib_v1.py that you are allowed to wildcard import, but that I never add new definitions to. When I want to add a definition I instead make foolib_v2.py.
Unison addresses this problem by not supporting breaking changes. Function definitions are addressed by content hash, not by name. It supports a first-class refactoring process that can be used for upgrading dependencies. It is also possible to use multiple versions of a dependency at once.
Have a problem. Introduce migrations. Now have two problems.
If you don’t control all of the usage sites, then the way to make an parameter optional is to add a new function with the new logic, and modify the old function to call the new function. Problem solved.