AVX-512: First impressions on performance and programmability
5 points by jmillikin
5 points by jmillikin
SIMT exposes a more “scalar”-like interface. In this case, your code would define the work for just a single pixel, and hardware/compiler together does the job of parallelizing the for loop
Mmm, I wanna see the author try ISPC!—
I can’t comment on ISPC because I don’t know it
welp.
What makes AVX512 more attractive to me is that it scales down to small data better than AVX2. It has much more uniform support for masking, especially for loads and stores, so there’s much less need for scalar code to deal with the unaligned ends of arrays. Nice for string ops!