Runtime validation in type annotations

8 points by natfu

rtpg

After using a lot of these I find myself really disliking the "type annotation"-based APIs like Pydantic.

Way back when, before django-stubs, we got typing for our Django models working through a simple trick: making models.CharField and friends descriptors.

Doing that means you don't have to do anything magical, you don't have to "lie" at some level with stuff like foo: int = MagicalIntFieldConfigurationObject(option_1='a'), and it lets inference work in a decently straightforward way.

At a higher level I tend to find runtime validation of type sigs to be something brought by people ignoring the "point" of all this type theory. The point of the type theory is that if you have foo(): int, then you don't need to validate the return value of foo()! And when you do value = foo(); bar(value) you don't need to re-validate that value is an integer!

You still need to validate untrusted input, but Pydantic will check that, say, 4 is an integer when you pass it in, despite you having proven at a static level that the literal 4 is an integer.

And when you get to more complicated validation the APIs quickly become way uglier than older school model validation libs in Python.

natfu

I agree with you in part and validation can be done via descriptors quite easily. Where I mildly disagree is about the validation. The validation in Pydantic & friends via Annotated is usually reserved for things we can't encode in the type system like a value range and it needs to happen at runtime. Also, considering the main value of Pydantic validation is checking things like request.json() -> Any I think it's a bit unfair to say we have proven anything about the input value at that point.

Something I do enjoy about the Annotated approach is that you can compose validation callables quite easily and you don't need to create specific validator objects all the time. It's also easier to see what validators are used against which type imo, whereas in a descriptor you'll need to look up what attribute is actually a descriptor and which isn't; but that's mostly a taste thing.

Small tangent, but I think it's better to have validation early (at initialization time) and then a frozen object that doesn't require any validation anymore, in the "parse, don't validate" sense. So in general, I would avoid doing runtime validation in objects all the time, regardless of the approach chosen.
- rtpg
  
  Something I do enjoy about the Annotated approach is that you can compose validation callables quite easily and you don't need to create specific validator objects all the time.
  
  This is a good point. I think it would be possible to do similar things with a Django-style validator model if you wrote the library to take advantage of it (generics exist after all!) but it's not the norm. Composability is nice.