Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix
9 points by driib
9 points by driib
Upper is the metamodel for Connected Data in UDA — the model for all models. It is designed as a bootstrapping upper ontology, which means that Upper is self-referencing, because it models itself as a domain model; self-describing, because it defines the very concept of a domain model; and self-validating, because it conforms to its own model.
Are we sure this means something?
I am somewhat certain it does. BFO book or lecture may need to be consumed and understood before this fully makes sense, unfortunately. Upper ontologies are quite useful to help prevent grave information modelling mistakes (cf. BFO’s dependents vs continuants). This is somewhat helpful only if you decide to forego ontological nominalism (“these things will be called ducks because I said so”) and instead adopt ontological realism (“these things look similar and quack, therefore we will call them ducks”).
I, however, have two nits about the statement you highlighted:
At the same time, I hope this gains momentum and maybe Netflix merges/aligns Upper with at least schema.org (popular for webpage metadata) or ISO/CD 23726-3 Industrial Data Ontology (quite popular in Oil&Gas).
P.S. While we are on the topic of things that might or might not mean anything at all - my favourites are atomless gunk and worldless junk. Are they real? We might never know.
Further suggested reading is summarized in my comment on the orange site.
I hope this or a similar initiative will go big, because the real benefits of such approaches only materialize when you need to connect more than 3 systems with differing information models but also when there is enough uptake in the market and tools come with such APIs out of the box. For example, OSLC (disclosure: I am involved with the project), gets attention when someone tries to connect IBM Jazz and Siemens Polarion that come with OSLC out of the box - there is far less interest to go and create OSLC APIs yourself for systems/tools you wish to integrate even for cases where the benefits are present in the same way as for Jazz/Polarion.
Designing models independently from how they will be used sounds like a recipe for disaster. The idea that concepts like ‘actor’ or ‘movie’ have universal descriptions independent from how the concepts are utilized seems deeply flawed to me.
I mean that’s sort of the basis CQRS for instance is it not? Where you separate storage and structure of a thing and the projections of that data. In the simplest case materialized views over joined and computed properties for an API powering a UI component for example where you join User, a list of Dashboards, Permissions into a single representation to service a UI view.
It seems like they have a big GraphQL endpoint that allows downstream systems to get most of the way there building the projections they need but from a single, agreed model that has consensus. That in a large org with a sprawling and varied data landscape is pretty powerful and the only major downside I can think of is the maintenance burden if different teams are expected to maintain and do upkeep on the models in a non-trivial way. Your models could rot pretty quickly eroding the validity of the overall model.
They’re encoding a central write model, how systems project that data for use further down the line is up to them but it’s based off of concrete models that have consensus on meaning through the UDA for their overall structures and properties I think.
Data sure, but how do you model complex domain behavior and constraints related to it? While data discovery, access and coherence is important, not ending up with a big ball of functionality mud is the big issue?