Compositional Datalog on SQL: Relational Algebra of the Environment
21 points by philzook
21 points by philzook
This is a neat gem. Have you tried benchmarking it?
In the past, we’ve played around with executing Datalog or (Datalog-style) queries via SQL, but as recursive SQL queries themselves.
One idea I haven’t had time to play with is coding it the way you have with explicit iteration is that it means you could try using the SQL engine’s statistics and planner to dynamically change query plans for queries w/ many relations in response to the shifting relation sizes over a long-running query.
A good question. I have not tuned it yet, but the first benchmark I just tried is disastrously slower than souffle :(
Ok, with the caveat that this is a single artificial benchmark
.decl edge(x: number, y: number)
.decl path(x: number, y: number)
//.input edge
.output path
edge(n, n+1) :- n = range(0, 1000).
path(x,y) :- edge(x,y).
path(x,z) :- edge(x,y), path(y,z).
souffle /tmp/test.dl
Takes ~0.2s in interpreted mode, ~0.1s in compiled mode. -j
parallelization doesn’t seem to help much (too small?)
import time
import sqlite3
db = sqlite3.connect(":memory:")
t = time.time()
fix(db, [edge, path],
[edge(n, n+1) for n in range(1, 1000)] + \
[path(x,y) <= edge[x,y],
path(x,z) <= edge[x,y] & path[y,z]
])
time.time() - t
This takes about 0.7s
A seminaive python loop takes about 20s
edge1 = {(n,n+1) for n in range(1, 1000)}
path1 = edge1.copy()
dpath = edge1.copy()
newpath = set()
for i in range(1010):
#print(i, len(dpath), len(path1))
for (x,y) in edge1:
for (y2,z) in dpath:
if y == y2:
newpath.add((x,z))
dpath = newpath - path1
path1 |= newpath
newpath = set()
if not dpath:
break
Seminaive duckdb takes about 6s. Maybe this is because duckdb is not tuned for set semantics?
The non seminaive loop is much slower.
I’m satisfied by this. I never expected to beat souffle. I think this is a much simpler, more flexible system at maybe 3-5x overhead on this particular benchmark.