Pure vs. impure iterators in Go
12 points by carlana
12 points by carlana
I think the terms the author is looking for are restartable and non-restartable.
Using "pure" is confusing -- you can have a restartable iterator over a file on disk (under assumption of no concurrent mutation) and you can have a non-restartable one as well (e.g. if you're modifying a seek cursor tracked by libc for you). However, both of those would be "impure" because you're doing I/O.
Is anyone else horrified that there's no way to tell whether an iterator you're range-ing over twice will continue where you left off or restart from the beginning?
I just know that this is going to end up a source of bugs for people. Say, any situation where someone is first consuming a preamble, and then "everything else" (which silently contains the preamble again).
I don't expect it to come up very often, but I could see it being really hard to debug (or for it to just stay hidden without anyone ever noticing).
I guess what I am saying is that I am so far still not convinced iterators for Go are that much of a value add. (The Seq and Seq2 distinction due to the lack of tuples doesn't help with this.)
Is anyone else horrified that there's no way to tell whether an iterator you're range-ing over twice will continue where you left off or restart from the beginning?
Given this is a normal behaviour for "iterables" in essentially every other language and so far it's not exactly proven to be a massive hindrance, not really? You assume your iterables are one shot, and if you need repeatable iteration you either check if the concrete type is repeatable, or you reify to a sequence (usually some sort of arraylist), or if available you use some sort of tee to fork the iterator for you (depending how the iterations are scattered a tee may not have to buffer the entire thing the way a sequence would).
I'm not going to say this issue never occurs, but the main occurrences I remember personally was migrating $DAYJOB's codebase from Python 2 to Python 3 as a few iterables became non-repeatable (e.g. map/filter), and a few of those the list() wrappers were eagerly removed from under the assumption that they were one-shot iterated but it turned out they were passed down half a dozen methods and ended up being used in multiple loops.
I don't think it even remotely ranks in the list of go footguns.
Given this is a normal behaviour for "iterables" in essentially every other language and so far it's not exactly proven to be a massive hindrance, not really? You assume your iterables are one shot, and if you need repeatable iteration you either check if the concrete type is repeatable, or you reify to a sequence (usually some sort of arraylist), or if available you use some sort of tee to fork the iterator for you (depending how the iterations are scattered a tee may not have to buffer the entire thing the way a sequence would).
In Go, it’s no longer safe to assume iterators are one-shot. In Python an iterator over a list is one-shot but in Go it’s repeatable.
In Go, it’s no longer safe to assume iterators are one-shot.
This statement is nonsensical. You can always assume an iterator is one-shot, just ignore those which are repeatable.
In Python an iterator over a list is one-shot but in Go it’s repeatable.
In Python, an iterable may or may not be one-shot. And that is what a go iterator corresponds to.
That seems like quite a squint. In Python, an iterable is a thing which yields iterators—things with a next method. Iterating over the iterator calls the next method repeatedly until the iterator is exhausted, and if iteration is stopped prematurely then you can call next() again and get the next value.
Go’s Seq is neither an iterable nor an iterator. It doesn’t have a method for yielding a thing with a Next() method which is used for iteration. You can convert between the two with a goroutine and a channel (which I think is what iter.Pull does) but that’s not the basis for iteration and anyway it’s quite complex.
The footgun is that Go’s sequences seem at first glance like they should be an iterator (or iterable) but they’re not (except when they are).
That seems like quite a squint.
It really isn't, and it very much explains the behaviour.
In Python, an iterable is a thing which yields iterators—things with a next method.
An iterable is a thing you can iterate over. An iterator is the state for an iteration instance. Go gives you the first one, but doesn't give you the second one (save via iter.Pull but that's not what its iteration protocol uses).
Go’s Seq is neither an iterable nor an iterator.
iter.Seq is a promise on the ability to iterate but does not reify the iteration state. That's an iterable. Iterables can be stateless or stateful, which is how iter.Seq behaves.
You can convert between the two with a goroutine and a channel (which I think is what iter.Pull does)
Not that this is relevant to the subject, but iter.Pull uses coroutines, not goroutines and channels.
The footgun is that Go’s sequences seem at first glance like they should be an iterator (or iterable) but they’re not (except when they are).
Misnaming aside I don't think there is any actual footgun, just a moral panic. Nobody is surprised that repeatedly iterating on a slice yields the same thing over and over again (barring mutations) but doing the same on a channel does not.
And, again, go's "iterators" / sequences are iterables. You can see all the same behaviours in different iterables. Hell, you can even see iterables with the same purpose being in a different "category" between languages: in Python, dict.values() is a view object which is repeatedly iterable, while in Rust it returns a one-shot iterable.
It really isn't [a squint] … An iterable is a thing you can iterate over. An iterator is the state for an iteration instance. Go gives you the first one, but doesn't give you the second one
This is what I meant by “it seems like a squint and a footgun”. It looks vaguely like a Python iterable (or an iterator) such that people are likely to use it as such without immediately realizing the perilous differences (notably that iteration may or may not resume).
You can blame and insult the users for not understanding it properly, but that does not rebut the “footgun” claim (a thing does not have to befuddle the infallible user in order to be a footgun).
in Python, dict.values() is a view object which is repeatedly iterable, while in Rust it returns a one-shot iterable.
In both cases, you get a stateful, resumable iterator. This isn’t true in Go, and this is the salient detail which will inevitably trip up users.
I don't think it's a massive footgun either, but I've been spoiled by Rust iterators which are always resumable/"one-time use".
I'm not sure if a similar design was possible for Go, but it's slightly annoying that we're taking a step back here. (I don't think you lose anything by making each iterator resumable, fwiw.)
Re: Python, the most confusing thing here is imo just the difference between iterators and iterables. This isn't really a problem in Go as far as I can tell.
I don't think it's a massive footgun either, but I've been spoiled by Rust iterators which are always resumable/"one-time use".
Because Rust generally uses ownership transfers for them they're definitely more rare, but you can absolutely get similar concerns when getting iterators from unique references (or worse when inner mutability gets involved). Here's the article's fib3 in rust (and yes it's a lot gnarlier and you would skip Fib3 in Rust, but the synthetic example is just as dumb in the original version, this shows that with a more complicated system with stateful types interacting it's absolutely possible).
Re: Python, the most confusing thing here is imo just the difference between iterators and iterables. This isn't really a problem in Go as far as I can tell.
Because Go does not have the iterator part as a first-class object: what Go calls iterators is closer to Python iterables, or Rust's IntoIterator. iter.Pull is how you get what other languages call an iterator.
Although because Go doesn't have an iterator (by default) you always have to interact with an "iterable", and that does mean it's easy to make the mistake of putting the state into the iterable side.
Is anyone else horrified that there's no way to tell whether an iterator you're range-ing over twice will continue where you left off or restart from the beginning?
No? It’s the same as other iterations patterns like calling Next() repeatedly to get it to move forward. Iterators over database rows, tokens in a tokenizer, lines in a file, generators, they all rely on this behavior. It’s generally clear in these cases they aren’t restarting.
Yes, exactly. If I get some iterator from a library somewhere (e.g. the Fibonacci iterator example), then I would not expect it to suddenly reset after I've consumed parts of the stream.
The problem isn't resumable iterators, is that it's not clear whether an iterator is resumable in the first place without analyzing the code.
You can easily imagine this causing subtle bugs, especially if the difference between creating a resumable and a non-resumable iterator may be as little as s := s.
No? It’s the same as other iterations patterns like calling Next() repeatedly to get it to move forward. Iterators over database rows, tokens in a tokenizer, lines in a file, generators, they all rely on this behavior. It’s generally clear in these cases they aren’t restarting.
You’re concluding that these are the same by pointing out that the Next() iterators were always stateful, but that’s precisely why they are different—the new standard library iter.Seq interfaces are frequently stateless. If iter.Seq was always stateful no one would have a problem.