No description
Find a file
tevador b8ce504be6 Added comments to hashAes1Rx4 and fillAes1Rx4
Fixed gcc compilation
Added performance numbers
2019-02-09 19:32:53 +01:00
doc Documentation formatting 2019-02-09 16:09:55 +01:00
src Added comments to hashAes1Rx4 and fillAes1Rx4 2019-02-09 19:32:53 +01:00
tests ALU/FPU test: Fixed MSVC x86 build 2018-11-20 22:41:34 +01:00
.gitignore RandomX portable interpreter 2018-12-11 21:00:30 +01:00
LICENSE Initial commit 2018-11-01 00:34:52 +01:00
makefile Added comments to hashAes1Rx4 and fillAes1Rx4 2019-02-09 19:32:53 +01:00
README.md Added comments to hashAes1Rx4 and fillAes1Rx4 2019-02-09 19:32:53 +01:00

RandomX

RandomX is a proof-of-work (PoW) algorithm that is optimized for general-purpose CPUs. RandomX uses random code execution (hence the name) together with several memory-hard techniques to achieve the following goals:

  • Prevent the development of a single-chip ASIC
  • Minimize the efficiency advantage of specialized hardware compared to a general-purpose CPU

Design

The core of RandomX is a virtual machine (VM), which can be summarized by the following schematic:

Imgur

Notable parts of the RandomX VM are:

  • a large read-only 4 GiB dataset
  • a 2 MiB scratchpad (read/write), which is structured into three levels L1, L2 and L3
  • 8 integer and 12 floating point registers
  • an arithmetic logic unit (ALU)
  • a floating point unit (FPU)
  • a 2 KiB program buffer

The structure of the VM mimics the components that are found in a typical general purpose computer equipped with a CPU and a large amount of DRAM. The scratchpad is designed to fit into the CPU cache. The first 16 KiB and 256 KiB of the scratchpad are used more often take advantage of the faster L1 and L2 caches. The ratio of random reads from L1/L2/L3 is approximately 9:3:1, which matches the inverse latencies of typical CPU caches.

The VM executes programs in a special instruction set, which was designed in such way that any random 8-byte word is a valid instruction and any sequence of valid instructions is a valid program. For more details see RandomX ISA documentation. Because there are no "syntax" rules, generating a random program is as easy as filling the program buffer with random data. A RandomX program consists of 256 instructions. See program.inc as an example of a RandomX program translated into x86-64 assembly.

Hash calculation

Calculating a RandomX hash consists of initializing the 2 MiB scratchpad with random data, executing 8 RandomX loops and calculating a hash of the scratchpad.

Each RandomX loop is repeated 2048 times. The loop body has 4 parts:

  1. The values of all registers are loaded randomly from the scratchpad (L3)
  2. The RandomX program is executed
  3. A random block is loaded from the dataset and mixed with integer registers
  4. All register values are stored into the scratchpad (L3)

Hash of the register state after 2048 interations is used to initialize the random program for the next loop. The use of 8 different programs in the course of a single hash calculation prevents mining strategies that search for "easy" programs.

The loads from the dataset are fully prefetched, so they don't slow down the loop.

RandomX uses the Blake2b cryptographic hash function. Special hashing functions fillAes1Rx4 and hashAes1Rx4 based on AES encryption are used to initialize and hash the scratchpad (hashAes1Rx4.cpp).

Hash verification

RandomX is a symmetric PoW algorithm, so the verifying party has to repeat the same steps as when a hash is calculated.

However, to allow hash verification on devices that cannot store the whole 4 GiB dataset, RandomX allows a time-memory tradeoff by using just 256 MiB of memory at the cost of 16 times more random memory accesses. See Dataset initialization for more details.

Performance

Preliminary mining performance with the x86-64 JIT compiled VM:

CPU RAM threads hashrate [H/s] comment
AMD Ryzen 1700 DDR4-2933 8 4100
Intel i5-3230M DDR3-1333 1 280 without large pages
Intel i7-8550U DDR4-2400 4 1200 limited by thermals
Intel i5-2500K DDR3-1333 3 1350

Hash verification is performed using the portable interpreter in "light-client mode" and takes 30-70 ms depending on RAM latency and CPU clock speed. Hash verification in "mining mode" takes 2-4 ms.

Documentation

FAQ

Can RandomX run on a GPU?

We don't expect GPUs will ever be competitive in mining RandomX. The reference miner is CPU-only.

RandomX was designed to be efficient on CPUs. Designing an algorithm compatible with both CPUs and GPUs brings too many limitations and ultimately decreases ASIC resistance. CPUs have the advantage of not needing proprietary drivers and most CPU architectures support a large common subset of primitive operations.

Additionally, targeting CPUs allows for more decentralized mining for several reasons:

  • Every computer has a CPU and even laptops will be able to mine efficiently.
  • CPU mining is easier to set up - no driver compatibility issues, BIOS flashing etc.
  • CPU mining is more difficult to centralize because computers can usually have only one CPU except for expensive server parts.

Does RandomX facilitate botnets/malware mining or web mining?

Quite the opposite. Efficient mining requires 4 GiB of memory, which is very difficult to hide in an infected computer and disqualifies many low-end machines. Web mining is nearly impossible due to the large memory requirement and the need for a rather lengthy initialization of the dataset.

Since RandomX uses floating point calculations, how can it give reproducible results on different platforms?

RandomX uses only operations that are guaranteed to give correctly rounded results by the IEEE 754 standard: addition, subtraction, multiplication, division and square root. Special care is taken to avoid corner cases such as NaN values or denormals.

Acknowledgements

The following people have contributed to the design of RandomX:

RandomX uses some source code from the following 3rd party repositories:

Donations

XMR:

845xHUh5GvfHwc2R8DVJCE7BT2sd4YEcmjG8GNSdmeNsP5DTEjXd1CNgxTcjHjiFuthRHAoVEJjM7GyKzQKLJtbd56xbh7V