Compositional Datalog on SQL: Relational Algebra of the Environment

21 points by philzook

This is a neat gem. Have you tried benchmarking it?

In the past, we’ve played around with executing Datalog or (Datalog-style) queries via SQL, but as recursive SQL queries themselves.

One idea I haven’t had time to play with is coding it the way you have with explicit iteration is that it means you could try using the SQL engine’s statistics and planner to dynamically change query plans for queries w/ many relations in response to the shifting relation sizes over a long-running query.

philzook

A good question. I have not tuned it yet, but the first benchmark I just tried is disastrously slower than souffle :(

philzook

Ok, with the caveat that this is a single artificial benchmark

.decl edge(x: number, y: number)
.decl path(x: number, y: number)
//.input edge
.output path

edge(n, n+1) :- n = range(0, 1000).
path(x,y) :- edge(x,y).
path(x,z) :- edge(x,y), path(y,z).

souffle /tmp/test.dl Takes ~0.2s in interpreted mode, ~0.1s in compiled mode. -j parallelization doesn’t seem to help much (too small?)

import time
import sqlite3 
db = sqlite3.connect(":memory:")
t = time.time()
fix(db, [edge, path],
[edge(n, n+1) for n in range(1, 1000)] + \
[path(x,y) <= edge[x,y],
path(x,z) <= edge[x,y] & path[y,z]
])
time.time() - t

This takes about 0.7s

A seminaive python loop takes about 20s

edge1 = {(n,n+1) for n in range(1, 1000)}
path1 = edge1.copy()
dpath = edge1.copy()
newpath = set()
for i in range(1010):
    #print(i, len(dpath), len(path1))
    for (x,y) in edge1:
        for (y2,z) in dpath:
            if y == y2:
                newpath.add((x,z))
    dpath = newpath - path1
    path1 |= newpath
    newpath = set()
    if not dpath:
        break

Seminaive duckdb takes about 6s. Maybe this is because duckdb is not tuned for set semantics?

The non seminaive loop is much slower.

I’m satisfied by this. I never expected to beat souffle. I think this is a much simpler, more flexible system at maybe 3-5x overhead on this particular benchmark.