Initial draft of RandomX

2024-08-15 00:23:14 +00:00 · 2018-11-01 00:46:39 +01:00 · 2018-11-01 00:46:39 +01:00 · 07a8318a45
commit 07a8318a45
parent a0abe8b44c
1 changed files with 163 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,2 +1,164 @@
 # RandomX
-Experimental proof of work algorithm based on random code execution
+RandomX ("random ex") is an experimental proof of work (PoW) algorithm that uses random code execution to achieve ASIC resistance.
+
+RandomX uses a simple low-level language (instruction set) to describe a variety of random programs. The instruction set was designed specifically for this proof of work algorithm, because existing languages and instruction sets are designed for a different goal (actual software development) and thus usually have a complex syntax and unnecessary flexibility.
+
+## Virtual machine
+RandomX is intended to be run efficiently and easily on a general-purpose CPU. The virtual machine (VM) which runs RandomX code attempts to simulate a CPU using the following set of components:
+
+![Imgur](https://i.imgur.com/xlhuF2K.png)
+
+#### DRAM
+The VM has access to 4 GiB of external memory in read-only mode. The DRAM memory blob is static within a single PoW epoch. The exact algorithm to generate the DRAM blob and its update schedule is to be determined.
+
+#### MMU
+The memory management unit (MMU) interfaces the CPU with the DRAM blob. The purpose of the MMU is to translate the random memory accesses generated by the random program into a CPU-friendly access pattern, where memory reads are not bound by DRAM access latency. The MMU splits the 4 GiB DRAM blob into 64-byte blocks (corresponding to the L1 cache line size of a typical CPU). Data within one block is always read sequentially in eight reads (8x8 bytes). Blocks are read mostly sequentially apart from occasional random jumps that happen on average every 1024 blocks. The address of the next block to be read is determined 1 block ahead of time to enable efficient prefetching. The MMU uses three internal registers:
+* **m0** - Address of the next quadword to be read from memory (32-bit, 8-byte aligned).
+* **m1** - Address of the next block to be read from memory (32-bit, 64-byte aligned).
+* **mx** - Random 64-bit counter that determines if reading continues sequentially or jumps to a random block. When an address `addr` is passed to the MMU, it performs `mx ^= addr` and checks if the last 10 bits of `mx` are zero. If yes, the adjacent 32 bits are copied to register `m1` and 64-byte aligned.
+
+#### Program
+The actual program is stored in a 8 KiB ring buffer structure. Each program consists of 1024 random 64-bit instructions. The ring buffer structure makes sure that the program forms a closed infinite loop.
+
+#### Control unit
+The control unit (CU) controls the execution of the program. It reads instructions from the program buffer and sends commands to the other units. The CU contains 3 internal registers:
+* **pc** - Address of the next instruction in the program buffer to be executed (64-bit, 8 byte aligned).
+* **sp** - Address of the top of the stack (64-bit, 8 byte aligned).
+* **ic** - Instruction counter = the number of instructions to execute before terminating. Initial value is 65536 and the register is decremented after each executed instruction.
+
+#### Stack
+To simulate function calls, the VM uses a stack structure. The program interacts with the stack using the CALL, CALLR and RET instructions. The stack has unlimited size and each stack element is 64 bits wide.
+
+#### Register file
+The VM has 32 integer registers r0-r31 and 32 floating point registers f0-f31. All registers are 64 bits wide.
+
+#### ALU
+The arithmetic logic unit (ALU) performs integer operations. The ALU can perform binary integer operations from 11 groups (ADD, SUB, MUL, DIV, AND, OR, XOR, SHL, SHR, ROL, ROR) with various operand sizes.
+
+#### FPU
+The floating-point unit performs IEEE-754 compliant math using 64-bit double precision floating point numbers. There are 4 binary operations (ADD, SUB, MUL, DIV) and one unary operation (SQRT).
+
+## Instruction set
+The 64-bit instruction is structured as follows:
+
+![Imgur](https://i.imgur.com/TbFlCux.png)
+
+##### Opcode
+There are 256 opcodes, which are distributed between various operations depending on their weight (how often it will occur in the program on average). The distribution of opcodes is following:
+
+|operation|number of opcodes||
+|---------|-----------------|----|
+|ALU operations|TBD|TBD|
+|FPU operations|TBD|TBD|
+|branching|33|13%|
+
+##### p1
+p1 (truncated to 5 bits) determines the number of the register, which contains the address of the first operand in DRAM. It is always an integer register even for floating point operations. The content of the register is passed to the MMU as the "address" for reading from DRAM.
+##### p2
+p2 (truncated to 5 bits) determines the number of the second operand register. It is an integer register for ALU operations and a floating point register for FPU operations.
+##### imm0
+An 8-bit immediate value that can be used as an input parameter instead of register p2.
+##### p3
+p3 (truncated to 5 bits) determines the number of the output register. It is an integer register for ALU operations and an floating point register for FPU operations. The result of the operation never overwrites the current value of the output register, but the two values are combined using XOR for integer registers and addition for floating point registers. This is done to make sure the value of a register depends on all previous operations output to this register.
+##### imm1
+An 8-bit immediate value that can be used by the CALL instruction instead of register p3.
+##### imm2
+A 32-bit immediate value that is used by some ALU operations as input instead of register p2.
+
+In the following description of instructions, r(x) refers to an integer register number x, f(x) refers to a floating point register number x and {x} represents the value obtained from the MMU when the value of integer register r(x) is passed as the read address.
+
+### ALU instructions
+
+All ALU instructions take 2 operands A and B and produce result C. If the operand size is smaller than the input size, the input is truncated. If the operand size is larger than the input size, the input is sign-extended for signed operations and zero-extended for unsigned (this applies to operations using *imm2*).
+
+After C is calculated, every ALU instruction performs `r(p3) ^= C`. 
+
+|opcodes|instruction|signed|A|A width|B|B width|C|C width|
+|-|-|-|-|-|-|-|-|-|
+|TBD|ADD_U64|no|{p1}|64|r(p2)|64|A + B|64|
+|TBD|ADD_U32|no|{p1}|32|r(p2)|32|A + B|32|
+|TBD|ADD_U16|no|{p1}|16|r(p2)|16|A + B|16|
+|TBD|ADD_UC64|no|{p1}|64|imm2|64|A + B|64|
+|TBD|ADD_UC32|no|{p1}|32|imm2|32|A + B|32|
+|TBD|SUB_U64|no|{p1}|64|r(p2)|64|A - B|64|
+|TBD|SUB_U32|no|{p1}|32|r(p2)|32|A - B|32|
+|TBD|SUB_U16|no|{p1}|16|r(p2)|16|A - B|16|
+|TBD|SUB_UC64|no|{p1}|64|imm2|64|A - B|64|
+|TBD|SUB_UC32|no|{p1}|32|imm2|32|A - B|32|
+|TBD|MUL_U64|no|{p1}|64|r(p2)|64|A * B|64|
+|TBD|MUL_U32|no|{p1}|32|r(p2)|32|A * B|64|
+|TBD|MUL_I32|yes|{p1}|32|r(p2)|32|A * B|64|
+|TBD|MUL_U16|no|{p1}|16|r(p2)|16|A * B|32|
+|TBD|MUL_I16|yes|{p1}|16|r(p2)|16|A * B|32|
+|TBD|MUL_UC64|no|{p1}|64|imm2|64|A * B|64|
+|TBD|MUL_UC32|no|{p1}|32|imm2|32|A * B|64|
+|TBD|MUL_IC32|yes|{p1}|32|imm2|32|A * B|64|
+|TBD|DIV_U64|no|{p1}|64|r(p2)|32|A / B, A % B|64|
+|TBD|DIV_I64|yes|{p1}|64|r(p2)|32|A / B, A % B|64|
+|TBD|DIV_U32|no|{p1}|32|r(p2)|16|A / B, A % B|32|
+|TBD|DIV_I32|yes|{p1}|32|r(p2)|16|A / B, A % B|32|
+|TBD|AND_U64|no|{p1}|64|r(p2)|64|A & B|64|
+|TBD|AND_U32|no|{p1}|32|r(p2)|32|A & B|32|
+|TBD|AND_U16|no|{p1}|16|r(p2)|16|A & B|16|
+|TBD|AND_UC64|no|{p1}|64|imm2|64|A & B|64|
+|TBD|AND_UC32|no|{p1}|32|imm2|32|A & B|32|
+|TBD|OR_U64|no|{p1}|64|r(p2)|64|A &#124; B|64|
+|TBD|OR_U32|no|{p1}|32|r(p2)|32|A &#124; B|32|
+|TBD|OR_U16|no|{p1}|16|r(p2)|16|A &#124; B|16|
+|TBD|OR_UC64|no|{p1}|64|imm2|64|A &#124; B|64|
+|TBD|OR_UC32|no|{p1}|32|imm2|32|A &#124; B|32|
+|TBD|XOR_U64|no|{p1}|64|r(p2)|64|A ^ B|64|
+|TBD|XOR_U32|no|{p1}|32|r(p2)|32|A ^ B|32|
+|TBD|XOR_U16|no|{p1}|16|r(p2)|16|A ^ B|16|
+|TBD|XOR_UC64|no|{p1}|64|imm2|64|A ^ B|64|
+|TBD|XOR_UC32|no|{p1}|32|imm2|32|A ^ B|32|
+|TBD|SHL_U64|no|{p1}|64|r(p2)|6|A << B|64|
+|TBD|SHL_UC64|no|{p1}|64|imm0|6|A << B|64|
+|TBD|SHR_U64|no|{p1}|64|r(p2)|6|A >> B|64|
+|TBD|SHR_UC64|no|{p1}|64|imm0|6|A >> B|64|
+|TBD|SHR_I64|yes|{p1}|64|r(p2)|6|A >> B|64|
+|TBD|SHR_IC64|yes|{p1}|64|imm0|6|A >> B|64|
+|TBD|ROL_U64|no|{p1}|64|r(p2)|6|A <<< B|64|
+|TBD|ROL_UC64|no|{p1}|64|imm0|6|A <<< B|64|
+|TBD|ROR_U64|no|{p1}|64|r(p2)|6|A >>> B|64|
+|TBD|ROR_UC64|no|{p1}|64|imm0|6|A >>> B|64|
+
+### FPU instructions
+Floating point instructions take two operands A and B and produce result C (except the SQRT_F64 instruction, which only takes one operand). After C is calculated, every FPU instruction performs `f(p3) += C`. The order of operations must be preserved since floating point math is not associative.
+
+|opcodes|instruction|A|B|C|
+|-|-|-|-|-|
+|TBD|ADD_F64|double({p1})|f(p2)|A + B|
+|TBD|SUB_F64|double({p1})|f(p2)|A - B|
+|TBD|MUL_F64|double({p1})|f(p2)|A * B|
+|TBD|DIV_F64|double({p1})|f(p2)|A / B|
+|TBD|SQRT_F64|abs(double({p1}))|-|sqrt(A)|
+
+### Branch instructions
+The CU supports 3 branch instructions:
+
+|opcodes|instruction|function|
+|-|-|-|
+|223-242|CALL|conditional near procedure call with static offset|
+|243-246|CALLR|conditional near procedure call with register offset|
+|247-255|RET|conditional return from procedure|
+
+All three instructions are conditional: the jump happens only if `(r(p2) & 0xFFFFFFFF) < imm2`. In case the branch is not taken, all three instructions perform `r(p3) ^= {p1}` ("arithmetic no-op").
+
+##### CALL and CALLR
+When the branch is taken, both CALL and CALLR instructions push the values `{p1}` (value read from DRAM) and `pc` (program counter) onto the stack and then perform a forward jump relative to the value of `pc`. The forward offset is equal to `8 * (imm1 + 1)` for the CALL instruction and `8 * ((r(p3) & 0xFF) + 1)` for the CALLR instruction. Maximum jump distance is therefore 256 instructions forward (this means that at least 4 correctly spaced CALL/CALLR instructions are needed to form a loop in the program).
+
+##### RET
+When the branch is taken, the RET instruction pops the return address `raddr` from the stack (it's the instructions following the corresponding CALL or CALLR), then pops a return value `retval` from the stack and performs `r(p3) ^= retval`. Finally, the instruction jumps back to `raddr`.
+
+## Program generation
+The program is initialized from a 256-bit seed value using a suitable PRNG. The program is generated in this order:
+1. All 1024 instructions are generated as a list of random 64-bit integers.
+2. Initial values of all integer registers r0-r31 are generated as random 64-bit integers.
+3. Initial values of all floating point registers f0-f31 are generated as random 64-bit integers converted to a double precision floating point format.
+4. The initial value of the `m0` register is generated as a random 32-bit value with the last 6 bits cleared (64-byte aligned).
+5. The remaining registers are initialized as `pc = 0`, `sp = 0`, `ic = 65536`, `m1 = m0 + 64`, `mx = 0`.
+
+
+## Result
+When the program terminates (the value of `ic` register reaches 0), the register file and the stack are hashed using the Blake2b has function to get the final PoW value. The generation/execution can be chained multiple times to discourage mining strategies that search for programs with particular properties.