5NF and Database Design

17 points by squadette

hyperpape

The previous piece on why explantions of 4NF are confusing is quite good: https://minimalmodeling.substack.com/p/historically-4nf-explanations-are.

ancienthero

I enjoyed both this and the article on 4NF. I'm not exaggerating when I say every single person I've spoken to that's taken a formal relational design course was left confused as to why 4NF and 5NF are even taught. It truly takes extra effort to make a design that's somehow 3NF but not 4NF or 5NF.

Your article on 4NF touches on the fact that these issues arised in the early days of relational databases when people were less familiar with the concept. These days, I think mere exposure to correctly designed schemas will expunge the possibility of making a non 5NF design from new students' minds. This is one of those scenarios where teaching materials shouldn't be so directly based on the original papers that inspired them.

hwayne

I think you could save the salesperson example by making it what they are allowed to sell. Louis is authorized to sell vacuum cleaners, and he can sell Acme products, so he can sell Acme vacuum cleaners. This is a little more generalizable, I think. For example, Louis could be licensed to drive certain vehicles and assigned to certain routes.

squadette

Your post on examples (https://buttondown.com/hillelwayne/archive/canonical-examples/) had a huge influence on my writing!

I like the "allowed" idea. E.g., a pilot is licensed to fly certain planes, and they are also cleared for certain flight routes (depending on difficulty). It's not clear though what is the connection between the route and plane type ("we need to fly this Boeing on the AMS—SIN rout")...

One thing even in the ice cream example that I don't like is that is that flavour is orthogonal to the brand. So far I could not find a satisfactory example of AB-BC-AC pattern that has clear real-world business application. Of course, this is I think part of the problem of explaining 5NF — no great examples.

alper

I don't think I've in any way understood what is meant by 5NF after scrolling through this.

squadette

That was not the goal of this text.

What I tried to achieve is to show how the patterns that arise around traditional 5NF explanations also arise naturally if you follow a completely different design path.

You're in a good company. Millist W. Vincent "A corrected 5NF definition for relational database design" (1997) (!) shows that the traditional definition was deficient. 5NF was introduced in 1979 (I was one year old then).

I find it plausible that if a concept this simple was unclearly defined for 18+ years it's not very useful per se, and we should attempt to find radically better explanation for the phenomena it's been trying to formalize, which I hoped to provide.
- hyperpape
  I think you've buried the headline in that case.
  
  Finally, we come to the conclusion that you don’t really need to involve 5NF to design your table schema. You begin with the logical model, and apply a textbook table design strategy that preserves normalization.
  
  This is the closest paragraph to a thesis statement, I think, but it's kind of wedged in the middle of a lot of text, and after reading it, I'm not 100% sure which thing you mean:
  
  You don't need 5NF
  
  Your tables should be in 5NF, but you don't have to think about 5NF to achieve that result.
unwind
So, if I'm reading this right, 5NF insists on not being overly-specific [1]. I like to think of it geometrically, like a chessboard: if I tell you about a "corner" and list g2 h2 g1 h1, then a few things become clear
1. if I'm talking about an entire subrectangle [2], I need only list the columns {g h} and the rows {2 1}, and the product is implied. Like lighting up a whole {ice cream brands} * {flavors} => {combinations}
2. hence, I should only list specific coordinates when I can't specify a whole subrectangle by its sides
3. the question is: does inclusion of {g2 h1 ...} imply {h2 g1}, according to the "business rules"?
4. it's only 5NF if you cannot imply some data -> other data in this sense ?
You could presumably write code that is or isn't 5NF, like for coord in coordinates: ... vs for r in rows: for c in cols: ... but nobody talks like that, I guess.

[1] presumably not "just" to reduce row count by some multiplicative factor? a la (M + N) < (M * N), but to prevent seeming inconsistencies where {g2 h2 g1} seems to omit {h1} but who would notice

[2] where a subrectangle isn't necessarily contiguous (rowsets and columnsets), unless you can rearrange cols/rows so that included ones are adjacent, like {even rows} * {left-half columns}. I just can't think of a better word than subrectangle even though the concept isn't exactly 2-D
andrewrogers

5NF is a subset of the cases graph databases are presumably optimized for.

The major technical distinction between relational databases and graph databases is the relational algebra operator they are organized around. A relational database is primarily optimized around selection and a graph database is primarily optimized around joins.

More broadly, a graph database is a relational database that supports join recursion. Which many relational databases do, albeit with internals optimized for selection operators.