Python 3.15’s interpreter for Windows x86-64 should hopefully be 15% faster

23 points by ngoldbaum

wareya

The next place tail call interpreters can go from here is bypassing the opcode lookup entirely and putting the function pointers for the opcode handlers directly in the bytecode stream. When done correctly this reduces the number of memory access per dispatch by 1 and gives you similar performance to the most trivial possible "threaded code" JITs. It does require making the bytecode format different (and bigger) for this type of interpreter, or doing evil things to function locations at link time to limit how far apart the function pointers can be by value (if you want to store only their lowest X bits in bytecode).