How to use views for debugging tools to save time
10 points by chris_penner
10 points by chris_penner
I've worked at a company that heavily uses BigQuery, and creating user-defined functions (UDFs) for debugging is extremely helpful. Especially as analytical data tends to spread in many different places, with complex ways to query the information you need.
While it's a nice technique (which is creating SQL views to simplify day-to-day work) it's also something pointing in one of pains of traditional SQL DB projects — that is how uncomfortable for querying are databases with proper normalization. I like noSQL exactly for that, and sometimes use duckdb to store one big messy table and don't care about joins (if it fits into memory that is). For me multi-join queries are close to old-perl-write-only-code level of misery, and not that fun as old-perl-write-only-code.
I agree in the grand scheme of things, that is one aspect of SQL that sucks. That you can't define the full relationship between two tables, you can only assign a simple dependency(FK). You can't say this is a one-multi table, or a multi-multi table, etc. That makes using SQL a PITA, since you have to define those on every query you do.
I think I understand why they did it though, because you want to do different kinds of joins on the data depending on what you are trying to do, like in the happy path you always want it this way, but when doing debugging or whatever, you might need to do a different kind of join to see what is going wrong. I just wish they made the happy path easier, but such is life sometimes.
That is why I usually build a view across the tables that do all those joins, with the proper relationships, and then I can do everything against those view(s). In Postgres for example you can even write through the view, so inserts and updates happen to the appropriate underlying tables. That lets you build that big messy table that makes your life easier on the happy path.
I would argue using NoSQL doesn't buy you what you think it does. It pushes the problem into the application, and requires your code to be 100% correct, which if it's anything like our code, that's roughly a 0% chance of being true. When you have your CHECK constraints in the DB too, the data has a much better chance of being correct and you have a higher chance of catching errors.
For data that doesn't matter, it doesn't matter what you do, but if the data doesn't matter, why are you storing it in the first place?
Well, there are many levels in the stack though where these joins can be improved — there are views, but also a super-thin ORM can keep transparency to SQL semantics while allowing you to write helper functions for whichever parts of the query get boilerplate-y. But yes, pure low-level approach is painful.