Making an LSP for great good
19 points by polywolf
19 points by polywolf
This whole series looks great! There’s a dearth of tutorials on building a statically-typed language.
shameless plug alert I gave a talk at a recent RustNYC about building a language server atop of a query-based compiler. It focuses on Pico, which is our replacement for Rust Analyzer's salsa. (A small and simple salsa, if you will.) https://drive.google.com/file/d/1qoJidpwT3gb2exQX18vlSjxZNzb2VRfT/view?usp=sharing&t=5091
Anyway, an advantage of query based compilers that the author misses is that it lets you respond to queries, even if your code is in a broken state. Consider let x: bool = 123; let y: bool = true. We don't need to validate the let x statement when we (say) hovering on y, and displaying its type. But (and especially for complicated situations), it's hard to imagine how one would write a batch compiler and yet be able to do this.
Great post!
But (and especially for complicated situations), it's hard to imagine how one would write a batch compiler and yet be able to do this.
I don't agree with this, in fact I think it may even be the opposite.
In a batch compiler all you need to do is not bail out on errors! This is typically required anyway because it is desirable to report more than just the first error.
For some types of errors (e.g. type inference failure) it's helpful to have a more global view of the program in order to improve the error quality. In a query-based approach this is harder to do without essentially reverting to a batch analysis.
Neither style helps you with the actually hard kind of error recovery, which is parse errors.
The benefits of the query-based approach are incrementality (performance) and possibly discipline in enforcing a more structured/consistent approach.
You can see this for our compiler pass queries, where they're all Uri -> Ast & Errors. They are each essentially the batch pass wrapped up as a query. That is in part because our language lacks top level items, so there isn't a more narrow granularity than the entire expression.
But even in languages with items you still have something that is effectively a batch compiler pass and runs a pass over all of a thing. The value of the query model is you can tune what that thing is to achieve better incremental reuse. If I run my pass over my entire program thats a batch compiler. But if instead my thing is a module, or an item in a module, you get better incremental reuse while still keeping multiple errors.
This is contingent on language design though. If you're a haskell, for example, with global type inference, item granularity is difficult to achieve because you need to run type inference over all the items that impact each other's inference. Rust doesn't have this problem because all top level items require signatures and so never impact inference of other items.
Huh. I think you're right.
Typically, a compiler is divided into a series of stages: say parse -> typecheck -> other validations -> generate code, and the errors emitted tend to come from a single stage. Take the parsing stage. The outcome of this stage is some data structure that contains "all of the parsed items", i.e. a Result<ParsedStuff, Vec<Diagnostic>>, rather than the ParsedStuff containing Diagnostics. Hence, it's awkward to emit errors from multiple stages.
One could solve this by having ParsedStuff contain results, which is effectively the "materialized" version of "running all of the parse queries", then having any errors infect any future calculation. e.g. we attempted to typecheck something that had failed to parse, it would be stored as an Err.
I don't remember why I didn't consider that, tbh. Perhaps I didn't consider that because the thought of making data structures generic over the error type (either Diagnostic or !) hadn't occurred to me, thus the idea of having e.g. an AST where every record is an error didn't occur to me.
Anyway, clearly "hard to imagine" is an overstatement. "Potentially more complicated to report errors from multiple stages" is much better wording. Or rather, "dealing with objects with Results at every level is a hassle."
We talk about this more in the previous posts, but all our front-end passes produce multiple errors for this exact reason. I can try to call it out better for the queries specifically but this is definitely an intended advantage of queries.