From b0faa05fe8ed59395da5f8ad7f6a89baefae10b7 Mon Sep 17 00:00:00 2001 From: tevador Date: Thu, 14 Mar 2019 21:46:38 +0100 Subject: [PATCH] Updated design notes --- doc/design.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/doc/design.md b/doc/design.md index c5832a4..0f72f5c 100644 --- a/doc/design.md +++ b/doc/design.md @@ -83,7 +83,7 @@ The COND instructions use the common condition flags that are supported by most ### Memory access -RandomX randomly accesses a large buffer of data (Dataset) 16384 times for each hash calculation. Since the Dataset must be stored in DRAM, it provides a natural parallelization limit, because DRAM cannot do more than about 25 million random accesses per second per channel. Each separately addressable DRAM channel allows a throughput of around 1500 H/s. +RandomX randomly reads from large buffer of data (Dataset) 16384 times for each hash calculation. Since the Dataset must be stored in DRAM, it provides a natural parallelization limit, because DRAM cannot do more than about 25 million random accesses per second per bank group. Each separately addressable bank group allows a throughput of around 1500 H/s. All Dataset accesses read whole CPU cache line (64 bytes) and are fully prefetched. The time to execute one program iteration described in chapter 4.6.2 of the Specification is about the same as typical DRAM access latency. @@ -95,6 +95,10 @@ Because 256 MiB is small enough to be included on-chip, RandomX uses a high-late Using less than 256 MiB of memory is not possible due to the use of tradeoff-resistant Argon2d with 3 iterations. When using 3 iterations (passes), halving the memory usage increases computational cost 3423 times for the best tradeoff attack ([Reference](https://eprint.iacr.org/2015/430.pdf) in Table 2 on page 8). +#### Scratchpad + +The Scratchpad is used as read-write memory. Its size was selected to fit entirely into CPU cache. Programs make, on average, 39 reads (instructions IADD_M, ISUB_M, IMUL_M, IMULH_M, ISMULH_M, IXOR_M, FADD_M, FSUB_M, FDIV_M, COND_M) and 16 writes (instruction ISTORE) to the Scratchpad per program iteration. This is close to a 2:1 read/write ratio, which CPUs are optimized for. + ### Choice of hashing function RandomX uses Blake2b as its main cryptographically secure hashing function. Blake2b was specifically designed to be fast in software, especially on modern 64-bit processors, where it's around three times faster than SHA-3 and can run at a speed of around 3 clock cycles per byte of input. @@ -108,7 +112,7 @@ From a cryptographic standpoint, SquareHash achieves full [Avalanche effect](htt (x+9507361525245169745)4398046511104 mod 264+1 , -where 4398046511104 = 242. The addition of the carry was removed to improve CPU performance. The constant `9507361525245169745` is added to make SquareHash sensitive to zero. +where 4398046511104 = 242. The addition of the carry was removed to improve CPU performance. The constant `9507361525245169745` is added to make SquareHash sensitive to zero (see chapter 3.4 of Specification). #### Generator