Model Once, Represent Everywhere: UDA (Unified Data Architecture) at Netflix

9 points by driib

altano

Upper is the metamodel for Connected Data in UDA — the model for all models. It is designed as a bootstrapping upper ontology, which means that Upper is self-referencing, because it models itself as a domain model; self-describing, because it defines the very concept of a domain model; and self-validating, because it conforms to its own model.

Are we sure this means something?

driib
I am somewhat certain it does. BFO book or lecture may need to be consumed and understood before this fully makes sense, unfortunately. Upper ontologies are quite useful to help prevent grave information modelling mistakes (cf. BFO’s dependents vs continuants). This is somewhat helpful only if you decide to forego ontological nominalism (“these things will be called ducks because I said so”) and instead adopt ontological realism (“these things look similar and quack, therefore we will call them ducks”).

I, however, have two nits about the statement you highlighted:
1. RDF itself is homoiconic, because models of RDF data are themselves RDF (in a similar way to Lisp). To see an example, compare a model in Section 4 with the actual data in Section 2.1 in the RDF Primer. In other words, you can start using RDF today and get some of those properties from day one.
2. At some level (deep enough - and upper ontology is the last stop), recursive/self-referencing definitions are actually undesired because you are trying to avoid logical errors (fallacies) such as circular reasoning.
3. Finally, upper ontologies are very easy to bungle and they should be left to people with a Ph.D. in philosophy. There are excellent UOs out there (BFO is my pick, also see DOLCE Ultralite, possibly SUMO) and you should just use one (i.e. define your own classes as subclasses of UO terms etc.). Defining your own UO is like saying that you didn’t like the stdlib of a language and wrote your own. Furthermore, Upper does not seem like a real upper ontology, as it defines quite applied concepts (the idea of an upper ontology is to be above anything applied).
At the same time, I hope this gains momentum and maybe Netflix merges/aligns Upper with at least schema.org (popular for webpage metadata) or ISO/CD 23726-3 Industrial Data Ontology (quite popular in Oil&Gas).

P.S. While we are on the topic of things that might or might not mean anything at all - my favourites are atomless gunk and worldless junk. Are they real? We might never know.

driib

Further suggested reading is summarized in my comment on the orange site.

I hope this or a similar initiative will go big, because the real benefits of such approaches only materialize when you need to connect more than 3 systems with differing information models but also when there is enough uptake in the market and tools come with such APIs out of the box. For example, OSLC (disclosure: I am involved with the project), gets attention when someone tries to connect IBM Jazz and Siemens Polarion that come with OSLC out of the box - there is far less interest to go and create OSLC APIs yourself for systems/tools you wish to integrate even for cases where the benefits are present in the same way as for Jazz/Polarion.

smaddox

Designing models independently from how they will be used sounds like a recipe for disaster. The idea that concepts like ‘actor’ or ‘movie’ have universal descriptions independent from how the concepts are utilized seems deeply flawed to me.

silby

you’re in good company
objectif_lune

I mean that’s sort of the basis CQRS for instance is it not? Where you separate storage and structure of a thing and the projections of that data. In the simplest case materialized views over joined and computed properties for an API powering a UI component for example where you join User, a list of Dashboards, Permissions into a single representation to service a UI view.

It seems like they have a big GraphQL endpoint that allows downstream systems to get most of the way there building the projections they need but from a single, agreed model that has consensus. That in a large org with a sprawling and varied data landscape is pretty powerful and the only major downside I can think of is the maintenance burden if different teams are expected to maintain and do upkeep on the models in a non-trivial way. Your models could rot pretty quickly eroding the validity of the overall model.

They’re encoding a central write model, how systems project that data for use further down the line is up to them but it’s based off of concrete models that have consensus on meaning through the UDA for their overall structures and properties I think.

aryeh

Data sure, but how do you model complex domain behavior and constraints related to it? While data discovery, access and coherence is important, not ending up with a big ball of functionality mud is the big issue?