There are 256 opcodes, which are distributed between 30 instructions based on their weight (how often they will occur in the program on average). Instructions are divided into 5 groups:
The second operand is loaded either from a register or from an immediate value encoded within the instruction. The `reg(b)` flag encodes an integer register (instruction groups IA and IS) or a floating point register (instruction group FA). Instruction group FS doesn't use operand B.
|loc(b)[2:0]|B (IA)|B (IS)|B (FA)|B (FS)
|---------|-|-|-|-|
|000|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|001|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|010|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|011|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|100|integer `reg(b)`|`imm8`|floating point `reg(b)`|-
|101|integer `reg(b)`|`imm8`|floating point `reg(b)`|-
The third operand is the location where the result is stored. It can be a register or a 64-bit scratchpad location, depending on the value of flag `loc(c)`.
|loc\(c\)[2:0]|address size (W)| C (IA, IS)|C (FA, FS)
|---------|-|-|-|-|-|
|000|15 bits|scratchpad|floating point `reg(c)`
|001|11 bits|scratchpad|floating point `reg(c)`
|010|11 bits|scratchpad|floating point `reg(c)`
|011|11 bits|scratchpad|floating point `reg(c)`
|100|15 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|101|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|110|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|111|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
Integer operations write either to the scratchpad or to a register. Floating point operations always write to a register and can also write to the scratchpad. In that case, bit 3 of the `loc(c)` flag determines if the low or high half of the register is written:
|loc\(c\)[3]|write to scratchpad|
|------------|-----------------------|
|0|floating point `reg(c)[63:0]`
|1|floating point `reg(c)[127:64]`
The FPROUND instruction is an exception and always writes the low half of the register.
For writing to the scratchpad, an integer register is always used to calculate the address:
Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are set to zero.
There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result.
##### Division
For the division instructions, the dividend is 64 bits long and the divisor 32 bits long. The IDIV_64 instruction interprets both operands as signed integers. In case of division by zero or signed overflow, the result is equal to the dividend `A`.
*Division by zero can be handled without branching by a conditional move. Signed overflow happens only for the signed variant when the minimum negative value is divided by -1. This rare case must be handled in x86 (ARM produces the "correct" result).*
The shift/rotate instructions use just the bottom 6 bits of the `B` operand (`imm8` is used as the immediate value). All treat `A` as unsigned except SAR_64, which performs an arithmetic right shift by copying the sign bit.
Operand A is loaded from memory as a 64-bit value. All floating point instructions apart FPROUND interpret A as two packed 32-bit signed integers and convert them into two packed double precision floating point values.
The FPROUND instruction has a scalar output and interprets A as a 64-bit signed integer. The 11 least-significant bits are cleared before conversion to a double precision format. This is done so the number fits exactly into the 52-bit mantissa without rounding. Output of FPROUND is always written into the lower half of the result register and only this lower half may be written into the scratchpad.
#### Rounding
FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` instruction. Denormal values must be flushed to zero.
#### NaN
If an operation produces NaN, the result is converted into positive zero. NaN results may never be written into registers or memory. Only division and multiplication must be checked for NaN results (`0.0 / 0.0` and `0.0 * Infinity` result in NaN).
Both instructions are conditional. If the condition evaluates to `false`, CALL and RET behave as "arithmetic no-op" and simply copy operand A into destination C without jumping.
The CALL instruction uses a condition function, which takes the lower 32 bits of integer register `reg(b)` and the value `imm32` and evaluates a condition based on the `loc(b)` flag:
The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected jump probability (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).*
Taken CALL instruction pushes the values `A` and `pc` (program counter) onto the stack and then performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)`. Maximum jump distance is therefore 128 instructions forward (this means that at least 4 correctly spaced CALL instructions are needed to form a loop in the program).
The RET instruction is taken only if the stack is not empty. Taken RET instruction pops the return address `raddr` from the stack (it's the instruction following the previous CALL), then pops a return value `retval` from the stack and sets `C = A XOR retval`. Finally, the instruction jumps back to `raddr`.