Five Years of Trying to Add Recursion to lychee

10 points by vbernat


quad

The key move is that a WaitGuard can be cloned. A task can spawn sub-tasks (recursion!) while preserving the invariant that the WaitGroup only completes once every guard — including the ones held by recursive sub-tasks — has been dropped.

Last year, I was toyed with supervision trees for Rust futures. During my experimentation, I came to appreciate the genius of Tokio's CancellationToken. Specifically, the .child_token method:

Creates a CancellationToken which will get cancelled whenever the current token gets cancelled. Unlike a cloned CancellationToken, cancelling a child token does not cancel the parent token.

I wrote the following in my notes:

Finally, we can also see that the one-for-one and one-for-all restart strategies can be implemented using the shutdown token.

  1. A supervisor clones its shutdown token to create a token for its first generation of workers. (generation token)
  2. It clones a unique shutdown token from the generation token for each of its workers.
  3. A one-for-one strategy is effected by spending a worker's unique token; a one-for-all strategy is effected by spending the supervisor's generation token.

n.b. a rest-for-all strategy can also be implemented using a shutdown token, but it requires a bit more complexity. Either a token itself can offer a method to spend younger siblings' tokens as well itself, or a supervisor can construct an unbalanced tree of shutdown tokens. The details are left to the reader to as an exercise.

I keep running into the tree of tokens pattern in distributed systems. It probably has a name and a whole body of literature behind it. Sadly, I'm not read in.

pwy

I'm sure it's just me not appreciating the depth of the problem, but:

The classic solutions (Dijkstra–Scholten, token passing) just don’t map well onto Tokio’s channel-based world.

WaitGroup - effectively a form of token passing - seems to pair extremely well with channels, no?

You can even simplify it - the dumbest you can go is Arc<()> on which you call Arc::strong_count() every millisecond: once it drops down to one, it means every task has completed and no more tasks will appear, since you're the only owner left.

(either that or you forgotten to clone the Arc into a child task, ofc.)

carlana

I haven't used async Rust, so I can't comment on how to solve this problem there, but I've written (and rewritten) a recursive link fetcher in Go, so I can comment from that angle, that the thing that makes reasoning about a concurrent system feasible is centralizing the bookkeeping logic in a single thread/goroutine/whatever. I wrote about the principle in https://blog.carlana.net/post/share-memory-by-communicating/.

You can see the heart of my link fetcher here: https://github.com/spotlightpa/linkrot/blob/cfa44b0947a18a4d87502e1bfc47940f91522aa1/linkcheck/linkcheck.go#L176-L212 The "tasks" run concurrently, but the "manager" runs serially.