Updated specification

This commit is contained in:
tevador 2018-12-31 19:27:31 +01:00
parent 3caecc7646
commit bf8397b08d
2 changed files with 103 additions and 86 deletions

View file

@ -1,5 +1,4 @@
## RandomX instruction set
RandomX uses a simple low-level language (instruction set), which was designed so that any random bitstring forms a valid program.
@ -10,16 +9,19 @@ Each RandomX instruction has a length of 128 bits. The encoding is following:
*All flags are aligned to an 8-bit boundary for easier decoding.*
#### Opcode
There are 256 opcodes, which are distributed between various operations depending on their weight (how often they will occur in the program on average). The distribution of opcodes is following:
There are 256 opcodes, which are distributed between 30 instructions based on their weight (how often they will occur in the program on average). Instructions are divided into 5 groups:
|operation|number of opcodes||
|---------|-----------------|----|
|ALU operations|136|53.1%|
|FPU operations|78|30.5%|
|Control flow |42|16.4%|
|group|number of opcodes||comment|
|---------|-----------------|----|------|
|IA|115|44.9%|integer arithmetic operations
|IS|21|8.2%|bitwise shift and rotate
|FA|70|27.4%|floating point arithmetic operations
|FS|8|3.1%|floating point single-input operations
|CF|42|16.4%|control flow instructions (branches)
||**256**|**100%**
#### Operand A
The first operand is read from memory. The location is determined by the `loc(a)` flag:
The first 64-bit operand is read from memory. The location is determined by the `loc(a)` flag:
|loc(a)[2:0]|read A from|address size (W)
|---------|-|-|
@ -40,45 +42,56 @@ read_addr = reg(a)[W-1:0]
`W` is the address width from the above table. For reading from the scratchpad, `read_addr` is multiplied by 8 for 8-byte aligned access.
#### Operand B
The second operand is loaded either from a register or from an immediate value encoded within the instruction. The `reg(b)` flag encodes an integer register (ALU operations) or a floating point register (FPU operations).
The second operand is loaded either from a register or from an immediate value encoded within the instruction. The `reg(b)` flag encodes an integer register (instruction groups IA and IS) or a floating point register (instruction group FA). Instruction group FS doesn't use operand B.
|loc(b)[2:0]|read B from|
|---------|-|
|000|register `reg(b)`|
|001|register `reg(b)`|
|010|register `reg(b)`|
|011|register `reg(b)`|
|100|register `reg(b)`|
|101|register `reg(b)`|
|110|`imm8` or `imm32`|
|111|`imm8` or `imm32`|
|loc(b)[2:0]|B (IA)|B (IS)|B (FA)|B (FS)
|---------|-|-|-|-|
|000|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|001|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|010|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|011|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|-
|100|integer `reg(b)`|`imm8`|floating point `reg(b)`|-
|101|integer `reg(b)`|`imm8`|floating point `reg(b)`|-
|110|`imm32`|`imm8`|floating point `reg(b)`|-
|111|`imm32`|`imm8`|floating point `reg(b)`|-
`imm8` is an 8-bit immediate value, which is used for shift and rotate ALU operations.
`imm8` is an 8-bit immediate value, which is used for shift and rotate integer instructions (group IS). Only bits 0-5 are used.
`imm32` is a 32-bit immediate value which is used for most operations. For operands larger than 32 bits, the value is sign-extended. For FPU instructions, the value is considered a signed 32-bit integer and then converted to a double precision floating point format.
`imm32` is a 32-bit immediate value which is used for integer instructions from group IA.
Floating point instructions don't use immediate values.
#### Operand C
The third operand is the location where the result is stored.
The third operand is the location where the result is stored. It can be a register or a 64-bit scratchpad location, depending on the value of flag `loc(c)`.
|loc\(c\)[2:0]|write C to|address size (W)
|---------|-|-|
|000|scratchpad|15 bits|
|001|scratchpad|11 bits|
|010|scratchpad|11 bits|
|011|scratchpad|11 bits|
|100|register `reg(c)`|-|
|101|register `reg(c)`|-|
|110|register `reg(c)`|-|
|111|register `reg(c)`|-|
|loc\(c\)[2:0]|address size (W)| C (IA, IS)|C (FA, FS)
|---------|-|-|-|-|-|
|000|15 bits|scratchpad|floating point `reg(c)`
|001|11 bits|scratchpad|floating point `reg(c)`
|010|11 bits|scratchpad|floating point `reg(c)`
|011|11 bits|scratchpad|floating point `reg(c)`
|100|15 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|101|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|110|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
|111|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad
The `reg(c)` flag encodes an integer register (ALU operations) or a floating point register (FPU operations). For writing to the scratchpad, an integer register is always used and the write address is calculated as:
Integer operations write either to the scratchpad or to a register. Floating point operations always write to a register and can also write to the scratchpad. In that case, bit 3 of the `loc(c)` flag determines if the low or high half of the register is written:
|loc\(c\)[3]|write to scratchpad|
|------------|-----------------------|
|0|floating point `reg(c)[63:0]`
|1|floating point `reg(c)[127:64]`
The FPROUND instruction is an exception and always writes the low half of the register.
For writing to the scratchpad, an integer register is always used to calculate the address:
```
write_addr = 8 * (addr(c) XOR reg(c)[31:0])[W-1:0]
```
*CPUs are typically designed for a 2:1 load:store ratio, so each VM instruction performs on average 1 memory read and 0.5 write to memory.*
*CPUs are typically designed for a 2:1 load:store ratio, so each VM instruction performs on average 1 memory read and 0.5 writes to memory.*
#### imm8
An 8-bit immediate value that is used as the shift/rotate count by some ALU instructions and as the jump offset of the CALL instruction.
An 8-bit immediate value that is used as the shift/rotate count by group IS instructions and as the jump offset of the CALL instruction.
#### addr(a)
A 32-bit address mask that is used to calculate the read address for the A operand. It's sign-extended to 64 bits.
@ -88,33 +101,33 @@ A 32-bit address mask that is used to calculate the write address for the C oper
### ALU instructions
|weight|instruction|signed|A width|B width|C|C width|
|-|-|-|-|-|-|-|
|10|ADD_64|no|64|64|A + B|64|
|2|ADD_32|no|32|32|A + B|32|
|10|SUB_64|no|64|64|A - B|64|
|2|SUB_32|no|32|32|A - B|32|
|21|MUL_64|no|64|64|A * B|64|
|10|MULH_64|no|64|64|A * B|64|
|15|MUL_32|no|32|32|A * B|64|
|15|IMUL_32|yes|32|32|A * B|64|
|10|IMULH_64|yes|64|64|A * B|64|
|1|DIV_64|no|64|32|A / B|32|
|1|IDIV_64|yes|64|32|A / B|32|
|4|AND_64|no|64|64|A & B|64|
|2|AND_32|no|32|32|A & B|32|
|4|OR_64|no|64|64|A | B|64|
|2|OR_32|no|32|32|A | B|32|
|4|XOR_64|no|64|64|A ^ B|64|
|2|XOR_32|no|32|32|A ^ B|32|
|3|SHL_64|no|64|6|A << B|64|
|3|SHR_64|no|64|6|A >> B|64|
|3|SAR_64|yes|64|6|A >> B|64|
|6|ROL_64|no|64|6|A <<< B|64|
|6|ROR_64|no|64|6|A >>> B|64|
|weight|instruction|group|signed|A width|B width|C|C width|
|-|-|-|-|-|-|-|-|
|10|ADD_64|IA|no|64|64|`A + B`|64|
|2|ADD_32|IA|no|32|32|`A + B`|32|
|10|SUB_64|IA|no|64|64|`A - B`|64|
|2|SUB_32|IA|no|32|32|`A - B`|32|
|21|MUL_64|IA|no|64|64|`A * B`|64|
|10|MULH_64|IA|no|64|64|`A * B`|64|
|15|MUL_32|IA|no|32|32|`A * B`|64|
|15|IMUL_32|IA|yes|32|32|`A * B`|64|
|10|IMULH_64|IA|yes|64|64|`A * B`|64|
|1|DIV_64|IA|no|64|32|`A / B`|32|
|1|IDIV_64|IA|yes|64|32|`A / B`|32|
|4|AND_64|IA|no|64|64|`A & B`|64|
|2|AND_32|IA|no|32|32|`A & B`|32|
|4|OR_64|IA|no|64|64|`A | B`|64|
|2|OR_32|IA|no|32|32|`A | B`|32|
|4|XOR_64|IA|no|64|64|`A ^ B`|64|
|2|XOR_32|IA|no|32|32|`A ^ B`|32|
|3|SHL_64|IS|no|64|6|`A << B`|64|
|3|SHR_64|IS|no|64|6|`A >> B`|64|
|3|SAR_64|IS|yes|64|6|`A >> B`|64|
|6|ROL_64|IS|no|64|6|`A <<< B`|64|
|6|ROR_64|IS|no|64|6|`A >>> B`|64|
##### 32-bit operations
Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are zero.
Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are set to zero.
##### Multiplication
There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result.
@ -129,24 +142,27 @@ The shift/rotate instructions use just the bottom 6 bits of the `B` operand (`im
### FPU instructions
|weight|instruction|conversion method|C|
|weight|instruction|group|C|
|-|-|-|-|
|20|FPADD|`convertSigned52`|A + B|
|20|FPSUB|`convertSigned52`|A - B|
|22|FPMUL|`convertSigned51`|A * B|
|8|FPDIV|`convertSigned51`|A / B|
|6|FPSQRT|`convert52`|sqrt(A)|
|2|FPROUND|`convertSigned52`|A|
|20|FPADD|FA|`A + B`|
|20|FPSUB|FA|`A - B`|
|22|FPMUL|FA|`A * B`|
|8|FPDIV|FA|`A / B`|
|6|FPSQRT|FS|`sqrt(abs(A))`|
|2|FPROUND|FS|`convertSigned52(A)`|
#### Rounding
FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` instruction. Denormal values are not be produced by any operation.
All floating point instructions apart FPROUND are vector instructions that operate on two packed double precision floating point values.
#### Conversion of operand A
Operand A is loaded from memory as a 64-bit signed integer and then converted to a double-precision floating point format using one of the following 3 methods:
Operand A is loaded from memory as a 64-bit value. All floating point instructions apart FPROUND interpret A as two packed 32-bit signed integers and convert them into two packed double precision floating point values.
* *convertSigned52* - Clears the 11 least-significant bits before conversion. This is done so the number fits exactly into the 52-bit mantissa without rounding.
* *convertSigned51* - Clears the 11 least-significant bits and sets the 12th bit before conversion. This is done so the number fits exactly into the 52-bit mantissa without rounding and avoids 0.
* *convert52* - Clears the 11 least-significant bits and the sign bit before conversion. This is done so the number fits exactly into the 52-bit mantissa without rounding and avoids negative values.
The FPROUND instruction has a scalar output and interprets A as a 64-bit signed integer. The 11 least-significant bits are cleared before conversion to a double precision format. This is done so the number fits exactly into the 52-bit mantissa without rounding. Output of FPROUND is always written into the lower half of the result register and only this lower half may be written into the scratchpad.
#### Rounding
FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` instruction. Denormal values must be flushed to zero.
#### NaN
If an operation produces NaN, the result is converted into positive zero. NaN results may never be written into registers or memory. Only division and multiplication must be checked for NaN results (`0.0 / 0.0` and `0.0 * Infinity` result in NaN).
##### FPROUND
The FPROUND instruction changes the rounding mode for all subsequent FPU operations depending on the two least-significant bits of A.
@ -165,12 +181,15 @@ The rounding modes are defined by the IEEE-754 standard.
### Control instructions
The following 2 control instructions are supported:
|weight|instruction|function|
|-|-|-|
|24|CALL|near procedure call|
|18|RET|return from procedure|
|weight|instruction|function|condition|
|-|-|-|-|
|20|CALL|near procedure call|(see condition table below)
|22|RET|return from procedure|stack is not empty
Both instructions are conditional. The condition function takes the lower 32 bits of integer register `reg(b)` and the value `imm32` and evaluates a condition based on the `loc(b)` flag:
Both instructions are conditional. If the condition evaluates to `false`, CALL and RET behave as "arithmetic no-op" and simply copy operand A into destination C without jumping.
##### CALL
The CALL instruction uses a condition function, which takes the lower 32 bits of integer register `reg(b)` and the value `imm32` and evaluates a condition based on the `loc(b)` flag:
|loc(b)[2:0]|signed|jump condition|probability|*x86*|*ARM*
|---|---|----------|-----|--|----|
@ -185,13 +204,10 @@ Both instructions are conditional. The condition function takes the lower 32 bit
The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected jump probability (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).*
In case the branch is not taken, both CALL and RET become "arithmetic no-op" `C = A`.
##### CALL
Taken CALL instruction pushes the values `A` and `pc` (program counter) onto the stack and then performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)`. Maximum jump distance is therefore 128 instructions forward (this means that at least 4 correctly spaced CALL instructions are needed to form a loop in the program).
##### RET
The RET instruction behaves like "not taken" when the stack is empty. Taken RET instruction pops the return address `raddr` from the stack (it's the instruction following the previous CALL), then pops a return value `retval` from the stack and sets `C = A XOR retval`. Finally, the instruction jumps back to `raddr`.
The RET instruction is taken only if the stack is not empty. Taken RET instruction pops the return address `raddr` from the stack (it's the instruction following the previous CALL), then pops a return value `retval` from the stack and sets `C = A XOR retval`. Finally, the instruction jumps back to `raddr`.
## Reference implementation
A portable C++ implementation of all ALU and FPU instructions is available in [instructionsPortable.cpp](../src/instructionsPortable.cpp).

View file

@ -1,4 +1,5 @@
## RandomX virtual machine
RandomX is intended to be run efficiently on a general-purpose CPU. The virtual machine (VM) which runs RandomX code attempts to simulate a generic CPU using the following set of components:
@ -37,10 +38,10 @@ The control unit (CU) controls the execution of the program. It reads instructio
#### Stack
To simulate function calls, the VM uses a stack structure. The program interacts with the stack using the CALL and RET instructions. The stack has unlimited size and each stack element is 64 bits wide.
*Although there is no explicit limit of the stack size, the maximum theoretical size of the stack is 16 MiB for a program that contains only unconditional CALL instructions (the probability of randomly generating such program is about 5×10<sup>-912</sup>). In reality, the stack size will rarely exceed 1 MiB.*
*Although there is no explicit limit of the stack size, the maximum theoretical size of the stack is 16 MiB. Most programs will use around 4 KiB of stack.*
#### Register file
The VM has 8 integer registers `r0`-`r7` and 8 floating point registers `f0`-`f7`. All registers are 64 bits wide.
The VM has 8 integer registers `r0`-`r7` and 8 floating point registers `f0`-`f7`. The integer registers are 64 bits wide. The floating point registers are 128 bits wide and each stores two packed double precision numbers.
*The number of registers is low enough so that they can be stored in actual hardware registers on most CPUs.*
@ -48,7 +49,7 @@ The VM has 8 integer registers `r0`-`r7` and 8 floating point registers `f0`-`f7
The arithmetic logic unit (ALU) performs integer operations. The ALU can perform binary integer operations from 7 groups (addition, subtraction, multiplication, division, bitwise operations, shift, rotation) with operand sizes of 64 or 32 bits.
#### FPU
The floating-point unit performs IEEE-754 compliant math using 64-bit double precision floating point numbers. Five basic operations are available: addition, subtraction, multiplication, division and square root.
The floating-point unit performs IEEE-754 compliant math using 64-bit double precision floating point numbers. Five basic operations are available: addition, subtraction, multiplication, division and square root. All operations work with two packed double precision numbers.
#### Binary encoding
The VM stores and loads all data in little-endian byte order. Signed numbers are represented using two's complement.
The VM stores and loads all data in little-endian byte order. Signed integer numbers are represented using two's complement.