Pyrefly vs. ty: Comparing Python’s Two New Rust-Based Type Checkers
39 points by hoistbypetard
39 points by hoistbypetard
[ty developer here]
We are happy with the attention that ty is starting to receive, but it’s important to call out that both ty and pyrefly are still incomplete! (OP mentions this, but it’s worth emphasizing again here.)
There are definitely examples cropping up that hit features that are not yet implemented. So when you encounter something where you think what we’re doing is daft, please recognize that we might have just not gotten around to that yet. Python is a big language!
pyrefly, mypy, and pyright all assume that
my_list.append("foo")
is a typing error, even though it is technically allowed (Python collections can have multiple types of objects!) If this is the intended behavior, ty is the only checker that implicitly allows this without requiring additional explicit typing onmy_list
.
This is where we have to step back and talk about goals. What the Python type-checking community has been doing is, effectively, defining a parallel language – call it “Typethon” – which just happens to be syntactically compatible with Python (or, mostly is, and when it isn’t, Python changes to accommodate), despite having different semantics. So the question for any given Python type checker is: are you a “type checker for Python”, or are you a type checker for Typethon?
The quoted passage above highlights one of the differences: a Python list
is a heterogeneous container. “List of what?” is not a useful question to ask of a Python list
, because any given list might contain items all of the same type, or might contain items of many different types, etc. But in Typethon, lists are homogeneous and there effectively is no such thing as standalone list
, only list[T]
. Something similar is true of mapping types: in Python the type dict
is a mapping of nearly arbitrary keys and values (I say “nearly” only because hashability requirements do exist), while in Typethon there is, again, no dict
, only dict[K, V]
(if you must have a dict
whose values are not all of one type V
, you have to use the TypedDict
construct to specify the exact set of literal keys and types of values you expect).
This also leads to “fun” times for people porting existing code to use type annotations, because a lot of common Python things don’t work in Typethon. For example, if you have a list[str | None]
(say, obtained from reading a bunch of possibly-missing configuration values), you cannot use type narrowing operations to get to a list[str]
no matter what you do, because Typethon defines list
as an invariant type. You have to first go back and explicitly change the annotation to something like Sequence[str | None]
which – because Sequence
is covariant – does let you narrow (but then gets into yet more difficulty when you realize that in Typethon Sequence
is immutable).
Some of the above will be familiar to people who’ve read my previous comments on experiences with Typethon.
But in Typethon, lists are homogeneous and there effectively is no such thing as standalone list, only list[T].
This isn’t true. You could have a list[any] if you really want that.
It is absolutely true that “typed Python” requires lists to be homogenous. If it didn’t, then the example I quoted would not be an error in existing type checkers. Workarounds like explicitly annotating list[Any]
or list[object]
are just adopting extremely broad types the T
in list[T]
in order to allow inserting items of disparate types.
Not only do I not see the practical difference, I don’t understand the distinction you are trying to draw. Annotation with the root (“top”) type is the same thing as having no type constraint.
Annotation with the root (“top”) type is the same thing as having no type constraint.
In Python, if I know that whatever constructs the list always puts a str
in the last position, I can do some_list[-1].lower()
and it’s fine.
In Typethon, even if I know that about the list, the type checker doesn’t (and as far as I know can’t express this) – it only knows a very broad type which likely does not have a lower()
method.
That’s true, but is this a reasonable way of programming? Wouldn’t you be better off with a named tuple or object for this kind of application where you know that something is always there?
Are heterogeneous lists and collections that common though? Especially considering the additional complexity they introduce in iterators, generator expressions etc.
Python is routinely used by people who do not think of themselves as programmers, and who primarily just try things until they get it to work.
So yes, lots of things you think are bad ideas are likely extremely common in real-world Python code.
and this is essentially what ty
currently infers for x
in x = [5]
!
https://play.ty.dev/35c00316-512a-4027-a95e-6fd84da6005c (Unknown
is basically Any
)
Well, that’s what we currently infer, but not really on purpose — that’s just something we haven’t gotten around to implementing properly yet. (We didn’t have support for generic classes at all until ~3 weeks ago)
There is an open debate about what specifically we should infer that to. It’s pretty uncontroversial to prefer list[int]
over list[Literal[5]]
. But we also plan to implement some bidirectional type-checking, so that if you call list.append("string")
later on, we might infer list[int | str]
. (It’s a debate because it’s equally justifiable to infer list[int]
on the assumption that the append
call is an error!)
I broadly agree with you - for example, with Typescript, I think it’s better to think of it as a subset, rather than a superset of Javascript, because the set of valid Typescript programs is a subset of the set of valid Javascript programs.
That said, one of the things that has made Typescript work so well is that the type system is designed to be as close to Javascript as possible, such that the set of valid Javascript programs that cannot be represented in Typescript is continually shrinking. Idiomatic Typescript should largely look like idiomatic Javascript, and if there are differences, this is treated as an issue that Typescript needs to solve.
My impression with “Typethon” as you put it, is that this is less the case, and that a lot of good, valid, idiomatic Python programs are nearly impossible to correctly type using these type checkers. As a result, this splits Python more deeply into two communities, one that structures its code according to the type checker, and one that structures it according to the metaprogramming functionality that Python offers. And I’m not sure right now if there’s a good strategy for bringing these two communities back together.
Thank you very much for posting this and the previous comment.
Every time I claim that mypy is useless, someone tells me that I’m doing it wrong. It is good to see even others felt that it is useless.
Except it’s not useless. That’s your opinion (and a wrong one, at that). Type checkers do prevent real errors. Types are real, even if you want to ignore they exist.
I already run a linting suite that includes flake8
and pylint
. I have experimented with mypy
, and I have never seen mypy
catch a genuine bug that the other linters didn’t. I have seen mypy
force me to fight with it and restructure code and implement workarounds to deal with limitations of mypy
. The type narrowing saga I linked to is just one example.
Here’s another one that bites me a lot: I like to have centralized error handling in libraries, which means that often I catch an exception, and hand it off to a function which logs it and then either re-raises it or wraps it in another library-specific exception. I don’t think this is a weird or way-out-there thing.
But while mypy is fine with this (to take a silly example):
def my_function(input: int:) -> int:
if input == 3:
return 3
raise ValueError("Input has to be 3")
It is not fine with this:
def my_function(input: int:) -> int:
if input == 3:
return 3
error_handler(ValueError, "Input has to be 3")
Even when every path through error_handler
raises, mypy
insists that my_function
has a path which returns None
. It’s only if you go explicitly annotate error_handler
as NoReturn
(on Python < 3.11) or Never
(on Python >= 3.11) that mypy
finally decides this is OK.
Which is just useless busy work. mypy
obviously has code to tell when a given branch or path always raises, or else the first example would be an error too. But it won’t use that to figure out that the second example always raises.
To me annotating error_handler
with NoReturn
sounds like a very reasonable thing to do, both as documentation (I would generally expect a function to return in at least some cases, and if you already have that information in your doc comment, why not add it in the type signature?) and to ensure that error_handler
will never have a code path that does not raise.
I agree with you on the type narrowing. Unfortunately it’s not an easy problem to solve, but it is annoying and it comes up often.
My issue with the NoReturn
thing is the inconsistency. If mypy
is going to assume that code which always raises an exception still needs an explicit “I don’t return a value” annotation, it should always assume that for all such code. The fact that it selectively sometimes says “this always raises without returning, that doesn’t affect the return type” and sometimes says “this always raises without returning, that affects the return type” is what gets me.
I think that in particular is an issue with mypy and not with Python type checkers in general. The post has a section specifically about this (search for “Implicit Type Inference”).
mypy chooses not to infer return types (probably because it was one of the first type checkers and it was designed not to generate too much noise in unannotated codebases?) and other things that it very reasonably could with the information it has.
Other type checkers make different choices. I know that Pyright does infer return types, and the post mentions that Pyrefly also does.
it should always assume that for all such code
You’re basically asking for checked exceptions or exhaustively checked sum types now. I don’t disagree with their utility, but Python almost definitely won’t get either.
Let’s recap. Right now if I write this, the type checker is happy:
def my_function(input: int:) -> int:
if input == 3:
return 3
raise ValueError("Input has to be 3")
Even though the implicit “else” branch of this function’s body does not contain a return
statement, the type checker sees that it always raises, so it does not treat that branch as having an implicit return None
.
But the instant I change to this:
def my_function(input: int:) -> int:
if input == 3:
return 3
error_handler(ValueError, "Input has to be 3")
the type checker does assume an implicit return None
, and tells you so: it will say the correct return type of my_function
is int | None
. It does this even though all paths through error_handler
raise an exception, and even though it has to examine the body of error_handler
as part of its run.
The only fix for this is to explicitly annotate error_handler
‘s return type to be NoReturn
or Never
, which the type checker then accepts. That is actually getting close to checked exceptions, because it means the fact that a function always raises is treated as part of its type signature. But what I want is to be able to leave that off and have the type checker realize “oh, error_handler
always raises, so it doesn’t affect the return type of my_function
”.
I like typing in python and I never found any success with mypy. Perhaps try basedpyright or in the near future one of these two Rust tools.
My personal experience has been the reverse: although there are certain errors that linters catch (non-existent names for example) generally mypy and pyright have more real errors and fewer false errors.
The reason list
is invariant is because otherwise this would typecheck:
x: list[str] = ["one", "two"]
def add_none(l: list[str | None]):
l.push(None)
add_none(x)
# oops, now there's a None in my list[str]
Also, as of mypy latest, this is accepted (trying it out here):
from typing import TypeGuard
def guard_str(value: list[str | None]) -> TypeGuard[list[str]]:
"""
Narrowing type guard which indicates whether the given value
is a list of strings.
"""
# this could also be all(value) or all(v is not None for v in value)
return all(isinstance(v, str) for v in value)
def my_function(items: list[str | None]):
if not guard_str(items):
return
print(item.startswith("http://") for item in items)
I’m aware of the principle that covariant types should be read-only, and the reason for it.
If you go read my full older comment you’ll see why TypeGuard
– which does not care about generic type variance – was not actually suitable for my use case, because it only narrows one branch of the conditional. To narrow both branches you have to use TypeIs
instead, and TypeIs
does care about variance, which means it cannot narrow list[str | None]
to list[str]
, which then means you have to re-annotate to a covariant container type and accept the resulting immutability.
Hopefully you will agree that needing to keep in one’s mind all the combinations involved here (TypeGuard
can narrow, but only in the wrong branch; TypeIs
can narrow on both branches but can’t narrow this type, TypeIs
can narrow a Sequence
but then it becomes immutable in the type-checker’s eyes) and implement functions just to appease the checker is a bit cumbersome for what ought to have been a simple idiomatic Python statement, yes?
Because, remember, this whole mess kicked off for me because mypy
cannot interpret all()
as a narrowing construct.
What I wanted was:
my_list: list[str | None]
# populate the list...
if not all(my_list):
# Bubble up an error because items were missing...
# Beyond this point my_list contains only non-None values, by definition of all()
But mypy
cannot do that, hence the whole rigamarole of trying TypeGuard
and then TypeIs
, etc.
I like type annotations in Python, but also agree with your comment. I feel the type checking ideas are heavily borrowed from earlier statically typed languages, rather than explored as new ideas that would provide static checking utility for Python’s existing patterns. To take your example:
def f(a: list[str | None]):
for item in a:
if item is None:
return
a[-1].lower()
I get a type error on lower()
even though this is a valid program. The issue being the static checker does not “understand” the loop and its effect on the type of the a
. Maybe we should explore some form of abstract interpretation for type checking to make it understand patterns such as the above? We can kinda work around this by a method:
def validated(a: list[str | None]) -> list[str]:
for item in a:
if item is None:
raise ValueError()
return a
However validated
wont type check, but can hack that with a cast
, maybe? I’ll note there is some abstract evaluation logic in type checkers because this checks OK:
b = [item for item in a if a is not None]
b[0].lower()
Would be nice if it were possible to extend type checking so it understands more such patterns.
Isn’t assert isinstance(whatever, str)
or whatever = typing.cast(str, whatever)
an OK workaround for what you encountered? Sure, it’s not pretty, but (all?) type checkers accept it.
I wish the article would also benchmark pyre
(the OCaml version of pyrefly
). This might be an interesting comparison of two ML languages ;-)
I read elsewhere that they benchmarked them and found Pyrefly is much faster than Py for Instagram codebases (100s of millions of lines), but I strongly suspect it’s a case of second system syndrome.
I find the efforts of type-checking Python weird. I mostly prefer Python because it gives me incredible freedom in how I can structure my programs and abstractions, while keeping it fairly simple. Adding type-checking essentially puts limits on how I can abstract without insulting the type checker, and the reality is that Python typing is playing catch-up to existing statically-typed languages. If I need to restrict myself to a subset of typing from statically-typed languages, why then I would use Python rather than those other languages?
I choose Python because I do stuff that is hard/impossible to do “properly” with statically typed languages that exist today. I have stuff that would need dependent types to be properly statically typed - and that’s still a “active field of research” type thing - it’s not coming to Python any time soon. I have stuff where I’m subclassing without supertyping - it’s great for incrementally building up code from a base case to more and more complex cases - but everyone seems to assume that Liskov substitution has to be king, and I don’t thing many statically typed languages even allow that. If I needed the benefits of typed Python, I wouldn’t use Python, but rather one of statically typed languages that has better performance, better typing guarantees, always fully typed 3rd party libraries (simply as a consequence of everything being typed), and so on.
why then I would use Python rather than those other languages?
Sometimes it’s the only option. It’s what is already in use. It’s what is mandated by corporate. And at the end of the day you just want maintainable code on a shared base.
I take a light hand with type checking. Often, I only enable it for my IDE. And when I’m doing exploratory coding, I rarely annotate anything.
But when a project hits a point where I expect to set it aside for a long time, when I have enough time to do so, I like to go through and add as many annotations as I can, to help future me come back to it later.
I treat it as an aid to documentation and testing, essentially, and I find it plenty useful. Beyond that, pydantic uses type annotations for some really good and useful things. I find that it’s made good serialization/deserialization about 10x easier. And it’s the full reason that django-ninja or django-shinobi are so much easier to work with than DRF.
I think if you understand it from an angle like this, it’s much less of a weird effort.
Adding type-checking
Perhaps a nitpick: These tools do not add type-checking to Python, because it’s already there. Python is strongly typed. What they add is moving the type errors from runtime to build time.
As an example, say that you had a field in a dataclass that was a string but behaved like an enum. You refactor that to be an enum, but forget to change one string application to the new enum, and oops you don’t have unit test coverage for that part. Do you want to see this error in your editor, CI development pipeline, or production two days after it had caused numerous failures already? With static type checking, it will happen in one of the first two.
Python’s answer to problems like this has traditionally been an effort to have a 100% code coverage by unit tests. But that’s another maintenance burden for a thing that these tools can give you easier.
In my mind tools like these make it possible to use Python for many kinds of critical things.
Adding type-checking essentially puts limits on how I can abstract without insulting the type checker,
There’s always the Any type though, and all kinds of abstraction possibilities in the typing module.
I wouldn’t use Python, but rather one of statically typed languages that has better performance, better typing guarantees, always fully typed 3rd party libraries (simply as a consequence of everything being typed), and so on.
Yeah, that’s probably a good idea if you can make that decision. Learning to use all the various kinds of things in the typing module is not easier than just learning Rust.
Perhaps a nitpick: These tools do not add type-checking to Python, because it’s already there. Python is strongly typed. What they add is moving the type errors from runtime to build time.
Strong disagree there. There is very little type-checking actually happening even during runtime. Yes, Python is strongly typed - but that doesn’t mean that everything uses and relies on that. There’s plenty of structural typing which is the anti-default of these efforts, barely anything cares if you violate Liskov substitution principle, even though that’s an immediate type error from these tools with no good possibilities of resolution, etc. It’s fairly rare to for most Python code to sprinkle isinstance() everywhere to do type checks - therefore, there’s no type checking that’s already there. Type system - sure, it exists, but type checking is purely dependent on the libraries to implement.
There’s always the Any type though, and all kinds of abstraction possibilities in the typing module.
If I need to throw in a bunch of annotations that are plainly misleading to satisfy a type checker, I’d rather not have a type checker at all. Saying that a field is Any
when there is only a small set of values it could be, rules for which could be explained in a single-line comment, but that the static type system is unable to express, is actively harmful for reading the code. It’s a deliberate error, made only to please a dumb tool that cannot understand the true meaning. The “it makes the code more understandable” reason that static typing is often marketed with is actively disregarded in this case.
Yeah, that’s probably a good idea if you can make that decision. Learning to use all the various kinds of things in the typing module is not easier than just learning Rust.
FWIW, Rust would be behind a bunch of other languages for me as a statically-typed language to move off from Python - it’s main problem is that it the memory management is still largely the programmers concern (lifetimes are just fancier ways to manage memory), which is a choice that’s made for reasons that don’t matter for most stuff that’s written in Python. Static typing and automatic memory management is a way more comfortable point for a lot of “glue programming”, and these days it’s often seemingly missed in the discourse because Rust is so loud. You don’t need to torture yourself with complex lifetimes and restrictions on data structures with Rust to get memory safety - automatic memory management has been around for decades now and it is pretty damn nice to use, so why not do it when developer speed is more important that optimal resource utilization.
Strong disagree there. There is very little type-checking actually happening even during runtime. Yes, Python is strongly typed - but that doesn’t mean that everything uses and relies on that.
I’m not sure if we’re disagreeing? “doesn’t mean that everything uses and relies on that” is the thing that these tools aim to fix.
Saying that a field is Any when there is only a small set of values it could be, rules for which could be explained in a single-line comment, but that the static type system is unable to express, is actively harmful for reading the code.
I think situations like this are more rare than you seem to be suggesting. Usually it’s possible to nail down the types pretty well using just standard notation, and when it’s not, Any
is not weaker than omitting the type altogether.
You don’t need to torture yourself with complex lifetimes and restrictions on data structures with Rust to get memory safety
Clone() and Rc/Arc aren’t that horrible once you get used to them, but for everything I do, I would love a Rust with GC. I don’t think such a thing exists, and I think I follow the space on a regular basis. FWIW, Kotlin is close, but is hampered by the subjectively horrible developer experience enforced by IntelliJ.