14 Advanced Python Features
61 points by knl
61 points by knl
The “Proxy Properties” idea doesn’t really work. It appears to work in the example given only because print()
calls repr()
on a object.
If you try something like:
c.value + "xxx"
it just throws a TypeError
, because c.value
is not a string.
The author recognises it’s not production code. The fuller version linked to depends on lazy-object-proxy to do the hard bit here, which is making a wrapper object that appears to behave like whatever it is wrapping. It’s also the kind of thing you should really avoid if at all possible.
- Typing Overloads
I think this is actually an anti-pattern in Python. You can’t create real overloads, it’s just sugar for the type checker, but the implementations still end up very messy. You’re better off creating separate functions for separate types.
- Protocols
Additional quick tip: Add the
@runtime_checkable
decorator if you wantisinstance()
checks to work alongside your Protocols!
Oooo. That is very neat.
- Python Futures
I wonder why Python made two Future
types that aren’t compatible with each other? It seems asyncio.Future
is also bound to the running event loop?
- Proxy Properties
My experiences with these in libraries have been pretty poor as the type checkers don’t like them.
Some cool features mentioned, but a lot of them don’t feel very advanced to a day-to-day Python developer.
You can’t create real overloads
what do you mean by “real” here? The entire Python type hinting system is in a sense not “real”.
def f() -> int:
return 5
def g(v: str) -> str:
return f"ok: {v}"
print(g(f()))
Type hints would tell you this program has a type error but it doesn’t actually have a runtime type error at all, it just has a poorly-written type hint. Meanwhile you could be adding type hints to an existing Python program. This (terrible) program without type hints shows what I mean very succinctly:
def f(a):
return a * 2
print(f(10))
print(f("c"))
You can add accurate type hints with overloads without having to refactor your program and update every callsite:
@overload
def f(a: str) -> str: ...
@overload
def f(a: int) -> int: ...
def f(a: str | int) -> str | int:
return a * 2
Doing what you said, you’d duplicate the function definition or refactor the function into three functions (two user-facing, the third to centralize the commonalities to avoid duplication), and then go through the process of refactoring your whole app. That just adds friction to the process of adding type hints or doing certain types of refactors.
The existence of a type hinting engine doesn’t actually turn Python into a statically typed language even if you do use type hints everywhere.
Real in the way they’re enforced. They’re open to bugs and mistakes in the implementation. You picked something that should just be a generic, not an overload def f[T: str | int](a: T) -> T
it doesn’t require any discriminant or logic.
I mean that sounds smart and might seem correct to an uninformed observer that doesn’t try to run it, but try to actually run that and you’ll see that it errors in pyright and mypy. Unless I’m misunderstanding your comment, I think you’re suggesting this definition:
def g[T: str | int](v: T) -> T:
return v * 2
Pyright gives this warning:
not_real.py:17:12 - error: Type "str | int" is not assignable to return type "T@g"
Type "str | int" is not assignable to type "T@g" (reportReturnType)
1 error, 0 warnings, 0 informations
and mypy gives this warning:
not_real.py:17: error: Incompatible return value type (got "str | int", expected "T") [return-value]
Found 1 error in 1 file (checked 1 source file)
Inside of the body, v
‘s type is T
(which in practice means str | int
because it has so far not been narrowed), and v * 2
is str | int
. Generics are just type hints, they’re not actually doing monomorphization or anything like that.
And besides, I could have picked this type of function as the original function to which we want to add type hints:
def f(a):
if isinstance(a, str):
return int(a)
return str(a)
which you’d type hint as follows:
@overload
def f(a: int) -> str: ...
@overload
def f(a: str) -> int: ...
def f(a: str | int) -> str | int:
if isinstance(a, str):
return int(a)
return str(a)
They’re open to bugs and mistakes in the implementation.
That’s sorta an argument against Python’s type hinting systems in their entirety. Once you use Any
explicitly, or use Any
implicitly by type hinting some but not all parameters, or use Any
implicitly by leaving off the type parameter of a generic type, or use cast
, or call any function that does those things, guess what, the call stack isn’t perfectly type checked. Using @overload
means you’re adding a static hint to something that is fundamentally dynamically typed; @overload
doesn’t introduce dynamic dispatch or monomorphization and doesn’t pretend to. It’s not a statically typed language. The bar isn’t “is this as rigorous as Rust/Haskell/whatever”, the bar is “does this meaningfully help us improve the quality of existing untyped Python codebases”.
I found @overload
useful for typing stuff like __getitem__
on a custom starlette Request
object in my $JOB codebase. There’s usually better ways to structure your code than resorting to overloads but I see it in the same way as the reason why TypeScript’s type system is turing-complete: it’s good for giving types to complex APIs that were not developed with types in mind.
I’m currently trapped in a love/hate war with __getitem__
and pyright where it wants me to support slices, or ints, but not both, and attempting to change its definition doesn’t do what I want out of it. It might have something to do with subclassing Sequence
.
How does it work in runtime if it’s just sugar for the type checker?
The last definition of a name “wins” at runtime, so you define all the overloads as annotated stub functions and then write the real one.
https://docs.python.org/3/library/typing.html#typing.runtime_checkable
runtime_checkable()
will check only the presence of the required methods or attributes, not their type signatures or types.
If you’re referring to overloads, it works at runtime because there’s only one real implementation with a bunch of if statements to pick the correct argument set.
I appreciate this article, and the comment I’m about to make is not a reflection of it. I love that the author wrote this and shared it with us.
One of my reactions, though: about half or more are about type system stuff, and after 4 companies in Python with very different codebases, I’m largely of the opinion that it’s a waste of time, at least for most growing startups. If you’re Shopify and developing Sorbet, great. If you’re Instagram and can staff a dedicated team and already have code rewriting tools (and you’re company is mad profitable), go for it. But I see Python typing as happening at two different times:
“My company already has a large codebase and employee size, we think type hints will reduce bugs.” a) This is very hard to prove the efficacy of, and there’s probably better tasks for your team to tackle for code quality or developer experience (e.g. deploy times). and b) Most rollouts of Python’s type system will be too gradual to be noticed, and/or you may have a ton of libraries in place that make it less effective.
“We’re greenfielding, but we’re going to do it right and add types to everything.” If you’re greenfielding, and you want types, why pick Python? Its type annotations and the implementations offer among of the fewest guarantees even when everything is fully annotated, and there’s no backstop method to prevent you from shipping code that’s not annotated. Even if you configure your linter and tools to block commits and deploys until you get 100% coverage, well, at that point, you might as well use Go or Java or something. Additionally, keeping that constraint while adding libraries will be a challenge.
I’m pretty polyglot and acknowledge that “just use Go (or Lord, OCaml 😅) is not really an option for most people and shops. So keep using Python, but per all the above, is codebase quality really substantially going up with a type system with so many backdoors? Most places I see most people adding Dict[str, Any]
everywhere, or abusing *kwargs
to make a plenty big mess. “Then simply don’t make such a big mess.” Yeah, okay. Tell C++ programmers to simply not have data races.
There’s a lot I love about Python and wonderful ways to use it, I just wish we’d accept its tradeoffs instead of trying to turn mud into steel once we get to skyscraper-scale. I guess it feels like someone trying to add features to turn C into a scripting language – okay, if people really want that, but I think it’s a really poor fit.
F-strings can be nested, formatting may be parameterized, and even support walrus assignment!
now = datetime.now()
syms = {"Date":["%Y", "%m", "%d"], "Time": ["%H", "%M", "%S"]}
seps = {"Date":"-", "Time": ":"}
for y in ["Date", "Time"]:
print(f'{f"{(x := y)}: {now:{seps[x].join(syms[x])}}":~^20}')
prints
~~Date: 2025-04-23~~
~~~Time: 12:34:56~~~
Python is the blunt instrument I use to stitch things together for automation when bash runs out of runway, so I minimize what I use. I use context managers from time to time, but for-else, pattern matching, and short circuit evaluation all look like great things to remember next time I’m gluing something together.
I’m going to start using the structural pattern matching features right away, although I’m still not entirely comfortable with that walrus operator.
Protocols look interesting, I hadn’t seen those before. I was wondering how they interact with multiple inheritance… and the docs say you can use that to define intersections of protocols; you can also compose them with the pipe operator to get unions. Cool!
although I’m still not entirely comfortable with that walrus operator.
Personally, I use them often for branching where I need the result of the check. e.g.,
if var_exists := os.getenv("SOME_VALUE"): # do the thing
or
match some_value := conditional(): # create a name binding without needing to `as name` in every branch
It’s small, but when used consistently, reduces the mental overhead of temporary bindings. Unfortunately, it does not stop the name being exported in the namespace because of Python’s scoping.
I don’t know whether this speaks more to the complexity of the code I have to deal with at work or to my propensity for metaprogramming, but in the ~1.5 years I used Python due to my job I found myself reaching for most of these features.
Finally, as part of the 3.12 typing changes, Python also introduced a new concise syntax for type aliases!
# OLD SYNTAX - Python 3.5 to 3.9
from typing import NewType
Vector = NewType("Vector", list[float])
# OLD-ish SYNTAX - Python 3.10 to 3.11
from typing import TypeAlias
Vector: TypeAlias = list[float]
# NEW SYNTAX - Python 3.12+
type Vector = list[float]
This does not seem right? the first is new type. The last two are type aliases?
It’s been a a while but the (default) way Python handled positional vs. keyword parameters has always seemed the worst of both worlds. Letting the caller decide which to use puts the API developer in the position of having to support both call-styles and limiting refactoring options.
This is also assuming I understand how it works—I haven’t written Python in a while so I may be misremembering.
I’m not sure I understand this complaint. What is a better design?
If I am the author of some function, am I not exposed to calling site choices only in the event of changes or in the event of variadic arguments (usu. when forwarding)? In the former case, can I not use positional or keyword-only arguments to eliminate this problem. In the latter, can I not use inspect.signature
to eliminate the problem?
In other words:
def f(x, y): pass
# callee (`f`) cannot distinguish between:
f(123, 456)
f(123, y=456)
f(x=123, y=456)
As the author of f
, if I want to change the order of arguments, this change affects callers.
def f(y, x): pass
f(123, 456) # breaks
f(123, y=456) # breaks
f(x=123, y=456) # works
As the author of f
, if I want to change the name of arguments, this change affects callers.
def f(ecks, why): pass
f(123, 456) # works
f(123, y=456) # breaks
f(x=123, y=456) # breaks
I can use positional and keyword arguments to eliminate this problem: positional-only arguments can change names, keyword-only arguments can change positions. (With an import hook and a decorator, we could even design a utility that would allow us to version functions and adapt old API callers to the new API, but such a utility is probably not that useful in practice; more just calling convention is likely to change.)
def f(w, x, /, *, y, z): pass
def f(why, ecks, /, *, z, y): pass
f(123, 456, y=789, z=999)
In the case of variadic arguments, I can observe changes in calling convention.
def f(*args, **kwargs): pass
f(123, 456) # args=(123, 456); kwargs={}
f(123, y=456) # args=(123,); kwargs={'y': 456}
f(x=123, y=456) # args=(); kwargs={'x': 123, 'y': 456}
However, what could I reasonably do with the arguments passed to f
if there is no constraint on them, and I don’t even know what they might be? In general, the most useful thing I can do is forward them to another function which can make this distinction.
def f(*args, **kwargs): return g(*args, **kwargs)
def g(x, y): pass
f(123, 456)
f(123, y=456)
f(x=123, y=456)
This situation is covered by a previous example.
So the situation I want to consider is the use of *args
and **kwargs
where I observe (likely surgically) some portion. In this case, it may appear that I do have to know how the argument was passed in.
def f(*args, **kwargs):
kwargs['y'] = kwargs.pop('y', 0) # default
return g(*args, **kwargs)
def g(x, y): pass
f(123) # works
f(x=123) # works
f(123, y=456) # works
f(123, 456) # breaks
However, this can be solved with inspect.signature
.
from inspect import signature
def f(*args, **kwargs):
bound = signature(g).bind_partial(*args, **kwargs)
bound.arguments['y'] = kwargs['y'] = bound.arguments.get('y', 0) # default
# optional: preserve original calling convention
# args = (v for k, v in bound.arguments.items() if k not in kwargs)
# kwargs = {k: bound.arguments[k] for k in kwargs}
return g(*bound.args, **bound.kwargs)
def g(x, y, z=999): pass
f(123) # works
f(x=123) # works
f(123, y=456) # works
f(123, 456) # works
What is a better design?
Presumably one where /
and *
are implicit, rather than opt-in? (Not sure, just guessing.)
A better design would be one where I don’t have to add those separators because keyword and positional parameters are fully distinct (e.g. Ruby—at least in newer versions). Are those separators widely used in Python libraries? The suggested solution feels like a lot to just change a signature.
the (default) way Python handled positional vs. keyword parameters has always seemed the worst of both worlds
the single biggest mistake in the language imho