Commodore 64 Assembly, part 4: how does the machine execute machine code?
4 points by mms
4 points by mms
This is interesting, and sounds pretty right but as a narrative describing what the cpu is doing - although I find it a little hard to follow. The author hasn’t heard of how the instructions are actually decoded (PLA in 65xx).
Then the dance starts once again. What is sad here is that each operation takes different number of CPU cycles, as different number of actual operations need to take place. As to the best of my knowledge: this is the same for all x86 CPUS, but all instructions on ARM and RISC take the same number of cycles.
As far as I can tell this is not correct, although it is hard to find a global list of cycle counts for arm processors. On the face of it, different operations are going to take different clock cycles.
The most obvious multi-cycle ARM instructions are LDM and STM, load and store multiple.
Simple pipelined in-order RISC cores typically need one cycle for an ALU op, an extra cycle for a memory op, and a branch depends on the depth of the pipeline.
Precisely. Only the very earliest RISC designs had anything like one instruction, one cycle. PowerPC, for example, is littered with instructions which may have extended latencies, particularly the string and multiple word instructions, or if doing fetches on unaligned memory (PPC handles this in hardware while many other RISCs throw a fault). On later processors like the G5, there may also be latencies depending on where in a dispatch group the instruction is, and if it’s microcoded or “cracked.”
Understanding the 6502’s PLA is critical to understanding how it interprets individual opcodes and why it fetches one, two or no additional bytes. The PLA is how this is all sequenced, and also explains where undocumented instructions come from on NMOS 6502s. I appreciate the attempt to make the article approachable but for the 6502 in particular it can’t be handwaved away easily.
There is a structure to the opcode byte on the 6502, but it’s broken up as 3-3-2 (3 bits, 3 bits, 2 bits). If the lower two bits are 01
, then the top three bits define the ALU operation on the A register (OR
, AND
, EOR
, ADC
, STA
, LDA
, CMP
, SBC
in that order) and the next three bits define the addressing mode (indirect X
, zero page (8-bit address), immediate, absolute (16-bit address), indirect Y
, zero page X
, absolute X
, absolute Y
, in that order). 10
map different operations to the top three bits (ASL
, ROL
, LSR
, ROR
, STX
, LDX
, DEC
, INC
in that order), 00
maps pretty much the rest of the instructions, and if the bottom two bits are 11
, there are no defined opcodes.
Only the very earliest RISC designs had anything like one instruction, one cycle.
For example, MIPS has a branch delay slot to hide the latency of a pipeline flush due to a branch. A branch takes a few cycles, during which most of the core is idle… except if you fill some of the gap with an instruction that was already most of the way through the pipeline.