Thoughts on ECS
15 points by ngrilly
15 points by ngrilly
Regarding ECS and cache coherency, the writer makes a few points that in my understanding are problematic or simply plain wrong.
Cache misses - problematic thought experimentsFirst, regarding reading three pieces of data for the movement system: in an ECS with only one vector for each component, the movement system will indeed see the three cache misses mentioned during the movement system’s run. Beyond the first misses (which are likely also streamed side by side and are roughly equal to a single cache miss in time), further work will happen in an iterative fashion over the vectors with the memory prefetching kicking in. In an ECS with archetypes, the movement system will see 3 * N cache misses where N is the number of archetypes.
If one is actually reading just a single entity’s three data points then yes, it will cause three cache misses for a single action. But who cares? It’s apparently a one-off thing, this isn’t something that is done for many entities, otherwise they’d care about multiple entities being read in a loop. And if it’s a one-off thing (like a debugger or stats screen), then the speed is irrelevant. 200 nanoseconds, 600 nanoseconds, 100 microseconds, it will not matter to the user. This I think is problematic in the blog post: they’re thinking about performance in a one-off case and saying “well that’s problematic” which is a mistake. They should be thinking in multiples; as Mike Acton said it, try looking on the time axis.
Archetypes - plain wrongThe plain wrong part is, in my understanding, archetypes: now, ECS are many so maybe there are ECS’ that have an archetype system that actually creates struct-like objects in memory, but at least the first three Google results (for me anyway) describe archetypes in an entirely different way [1][2][3].
An archetype is a way to solve the problem of two entities both having some shared components, but not all of them. If one tries to store their components in the same vectors, they’ll find that some of these vectors will have different lengths because not all of the entities have all components. Any attempt to fix this issue will generally involve leaving some holes in the component vectors (and then trying to move them to the end or start if at all possible) but it’s probably not possible to solve this in a general way. Archetypes solve this problem by having multiple sets of component vectors; one for each kind of “archetype” so that the set is guaranteed to contain vectors of the same length because each entity of the same archetype has all the same components. Note that these are still component vectors: the components are still solved densely in memory and in homogeneous vectors with one another, not joined together into structs!
Changing the archetype of an entity, ie. adding or removing components from it, is indeed performance intensive. This is probably performance relevant in some applications (eg. mass-upgrade troops perhaps), irrelevant in others (eg. there’s no dynamic composition). Pointers to component data shouldn’t ever be kept around anyway (ECS is supposed to be operating on indexes), so I don’t really see how pointer invalidation is an issue.
Criticism of ECSThere is probably a lot of things that can be pointed to and said “this isn’t good” in ECS in general. Debugging is certainly one of them. Dynamic nature of an ECS I would not say is valid criticism: ECS doesn’t need to have dynamic component adding and removing capabilities, I’m sure you can find ECS libraries without those features or with ways to build static-components-only entities that get to avoid all overhead related to entity-component mapping or such. You can also of course build your own program where you use an ECS system that has none of those features.
One criticism that I haven’t seen very often but that does actually have a real effect is that, assuming you use a SoA vector, growing a SoA vector is apparently surprisingly expensive. The cost of having to copy a lot of data to make room for new capacity is not insignificant, and may ruin your day if your SoA vectors have to grow often for some reason.
Sensible ECSThe proposed struct here may work for the writer perfectly well, there’s nothing wrong with it if they want to stay closer to a traditional object-oriented engine design, but I’d be wary of calling this an ECS or even really data-oriented. First, the Monster struct may have padding: of course we cannot know but it’s entirely possible that the fields do not align nicely. Even if there is no padding inside the struct, putting these in a vector may introduce padding between “entities”. Second, their single monster’s data is now indeed laid out linearly in memory but no one cares about a single monster. The performance of a single thing is meaningless because if that thing truly happens once then its performance is immaterial to the performance of the system at large, or it actually happens as part of a larger whole and that’s what should be thought of. Again, think in multiples.
The reusable system-like movement update function nicely shows this problem: when iterating over all monsters the code will be loading the path finding data into cache because the monster’s data is not “scattered”, but it does not get used. Every single byte of path finding data will lead to a byte of some other, potentially usable data being evicted from cache for no reason. “But what if I then update path finding right after, huh? Then it’ll be in cache!” Yes, it’s possible to come up with scenarios where the extra data does soon get used; you need to know how you use your data and measure, measure, measure. But in general, loading data that does not get used is folly.
The reusable update functions also suffer from “thinking in singles”: the code takes in a singular pointer of each, and presumably performs some calculation on these. The compiler will likely not touch this function very much, the reads and writes through the pointers are what it is given and what it shall use. Even if the update function gets inlined, it will find that the memory layout of Monster is such that it can mostly just read and write data as instructed.
Comparing this to an actual ECS where the Transform, Velocity, Collider, and PathFinding components are each in their own vector: The update function now takes the Transform, Velocity, and Collider vectors and iterates over their entire length, doing independent operations on each index. This the compiler will happily convert into a SIMD algorithm. Not “scattering” the data made the program something like 2-10x slower than what it could’ve been.
All in all, I’m not exactly convinced by these thoughts. That being said, I’m writing a JavaScript engine built around a static ECS-like model so I’m not exactly the most objective person to comment on this.