diff --git a/doc/design.md b/doc/design.md index 14aa7b8..cb11a38 100644 --- a/doc/design.md +++ b/doc/design.md @@ -297,17 +297,24 @@ Using less than 256 MiB of memory is not possible due to the use of tradeoff-res ### 3.1 AesGenerator1R -AesGenerator1R was designed for the fastest possible generation of pseudorandom data to fill the Scratchpad. It takes advantage of hardware accelerated AES in modern CPUs. Only one AES round is performed per 16 bytes of output, which results in throughput exceeding 20 GB/s in most modern CPUs. While 1 AES round is not sufficient for a good distribution of random values, this is not an issue because the purpose is just to initialize the Scratchpad with random non-zero data. +AesGenerator1R was designed for the fastest possible generation of pseudorandom data to fill the Scratchpad. It takes advantage of hardware accelerated AES in modern CPUs. Only one AES round is performed per 16 bytes of output, which results in throughput exceeding 20 GB/s in most modern CPUs. + +AesGenerator1R gives a good output distribution provided that it's initialized with a sufficiently 'random' initial state (see Appendix F). ### 3.2 AesGenerator4R -AesGenerator4R uses 4 AES rounds to generate pseudorandom data for Program Buffer initialization. Since 2 AES rounds are sufficient for full avalanche of all input bits [[28](https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/aes-development/rijndael-ammended.pdf)], AesGenerator4R provides an excellent output distribution while maintaining very good performance. +AesGenerator4R uses 4 AES rounds to generate pseudorandom data for Program Buffer initialization. Since 2 AES rounds are sufficient for full avalanche of all input bits [[28](https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/aes-development/rijndael-ammended.pdf)], AesGenerator4R has excellent statistical properties (see Appendix F) while maintaining very good performance. The reversible nature of this generator is not an issue since the generator state is always initialized using the output of a non-reversible hashing function (Blake2b). ### 3.3 AesHash1R -AesHash was designed for the fastest possible calculation of the Scratchpad fingerprint. It interprets the Scratchpad as a set of AES round keys, so it's equivalent to AES encryption with 32768 rounds. Two extra rounds are performed at the end to ensure avalanche of all Scratchpad bits in each lane. The output of the AesHash is fed into the Blake2b hashing function to calculate the final PoW hash. +AesHash was designed for the fastest possible calculation of the Scratchpad fingerprint. It interprets the Scratchpad as a set of AES round keys, so it's equivalent to AES encryption with 32768 rounds. Two extra rounds are performed at the end to ensure avalanche of all Scratchpad bits in each lane. + +The reversible nature of AesHash1R is not a problem for two main reasons: + +* It is not possible to directly control the input of AesHash1R. +* The output of AesHash1R is passed into the Blake2b hashing function, which is not reversible. ### 3.4 SuperscalarHash @@ -468,6 +475,109 @@ This shows that SuperscalaHash has quite low sensitivity to high-order bits and When calculating a Dataset item, the input of the first SuperscalarHash depends only on the item number. To ensure a good distribution of results, the constants described in section 7.3 of the Specification were chosen to provide unique values of bits 3-53 for *all* item numbers in the range 0-34078718 (the Dataset contains 34078719 items). All initial register values for all Dataset item numbers were checked to make sure bits 3-53 of each register are unique and there are no collisions (source code: [superscalar-init.cpp](../src/tests/superscalar-init.cpp)). While this is not strictly necessary to get unique output from SuperscalarHash, it's a security precaution that mitigates the non-perfect avalanche properties of the randomly generated SuperscalarHash instances. +### F. Statistical tests of RNG + +Both AesGenerator1R and AesGenerator4R were tested using the TestU01 library [[30](http://simul.iro.umontreal.ca/testu01/tu01.html)] intended for empirical testing of random number generators. The source code is available in [rng-tests.cpp](../src/tests/rng-tests.cpp). + +The tests sample about 200 MB ("SmallCrush" test), 500 GB ("Crush" test) or 4 TB ("BigCrush" test) of output from each generator. This is considerably more than the amounts generated in RandomX (2176 bytes for AesGenerator4R and 2 MiB for AesGenerator1R), so failures in the tests don't necessarily imply that the generators are not suitable for their use case. + + +#### AesGenerator4R +The generator passes all tests in the "BigCrush" suite when initialized using the Blake2b hash function: + +``` +$ bin/rng-tests 1 +state0 = 67e8bbe567a1c18c91a316faf19fab73 +state1 = 39f7c0e0a8d96512c525852124fdc9fe +state2 = 7abb07b2c90e04f098261e323eee8159 +state3 = 3df534c34cdfbb4e70f8c0e1826f4cf7 + +... + +========= Summary results of BigCrush ========= + + Version: TestU01 1.2.3 + Generator: AesGenerator4R + Number of statistics: 160 + Total CPU time: 02:50:18.34 + + All tests were passed +``` + + +The generator passes all tests in the "Crush" suite even with an initial state set to all zeroes. +``` +$ bin/rng-tests 0 +state0 = 00000000000000000000000000000000 +state1 = 00000000000000000000000000000000 +state2 = 00000000000000000000000000000000 +state3 = 00000000000000000000000000000000 + +... + +========= Summary results of Crush ========= + + Version: TestU01 1.2.3 + Generator: AesGenerator4R + Number of statistics: 144 + Total CPU time: 00:25:17.95 + + All tests were passed +``` + +#### AesGenerator1R + +The generator passes all tests in the "Crush" suite when initialized using the Blake2b hash function. + +``` +$ bin/rng-tests 1 +state0 = 67e8bbe567a1c18c91a316faf19fab73 +state1 = 39f7c0e0a8d96512c525852124fdc9fe +state2 = 7abb07b2c90e04f098261e323eee8159 +state3 = 3df534c34cdfbb4e70f8c0e1826f4cf7 + +... + +========= Summary results of Crush ========= + + Version: TestU01 1.2.3 + Generator: AesGenerator1R + Number of statistics: 144 + Total CPU time: 00:25:06.07 + + All tests were passed + +``` + +When the initial state is initialized to all zeroes, the generator fails 1 test out of 144 tests in the "Crush" suite: + +``` +$ bin/rng-tests 0 +state0 = 00000000000000000000000000000000 +state1 = 00000000000000000000000000000000 +state2 = 00000000000000000000000000000000 +state3 = 00000000000000000000000000000000 + +... + +========= Summary results of Crush ========= + + Version: TestU01 1.2.3 + Generator: AesGenerator1R + Number of statistics: 144 + Total CPU time: 00:26:12.75 + The following tests gave p-values outside [0.001, 0.9990]: + (eps means a value < 1.0e-300): + (eps1 means a value < 1.0e-15): + + Test p-value + ---------------------------------------------- + 12 BirthdaySpacings, t = 3 1 - 4.4e-5 + ---------------------------------------------- + All other tests were passed + +``` + ## References [1] CryptoNote whitepaper - https://cryptonote.org/whitepaper.pdf @@ -528,3 +638,5 @@ Cryptocurrencies and Password Hashing - https://eprint.iacr.org/2015/430.pdf Tab [28] J. Daemen, V. Rijmen: AES Proposal: Rijndael - https://csrc.nist.gov/csrc/media/projects/cryptographic-standards-and-guidelines/documents/aes-development/rijndael-ammended.pdf page 28 [29] 7-Zip File archiver - https://www.7-zip.org/ + +[30] TestU01 library - http://simul.iro.umontreal.ca/testu01/tu01.html \ No newline at end of file diff --git a/doc/specs.md b/doc/specs.md index 3764c16..6f9ef39 100644 --- a/doc/specs.md +++ b/doc/specs.md @@ -169,41 +169,46 @@ state0 (16 B) state1 (16 B) state2 (16 B) state3 (16 B) ### 3.3 AesGenerator4R -AesGenerator4R works the same way as AesGenerator1R, except it uses 4 rounds per column: +AesGenerator4R works similar way as AesGenerator1R, except it uses 4 rounds per column. Columns 0 and 1 use a different set of keys than columns 2 and 3. ``` state0 (16 B) state1 (16 B) state2 (16 B) state3 (16 B) | | | | AES decrypt AES encrypt AES decrypt AES encrypt - (key0) (key0) (key0) (key0) + (key0) (key0) (key4) (key4) | | | | v v v v AES decrypt AES encrypt AES decrypt AES encrypt - (key1) (key1) (key1) (key1) + (key1) (key1) (key5) (key5) | | | | v v v v AES decrypt AES encrypt AES decrypt AES encrypt - (key2) (key2) (key2) (key2) + (key2) (key2) (key6) (key6) | | | | v v v v AES decrypt AES encrypt AES decrypt AES encrypt - (key3) (key3) (key3) (key3) + (key3) (key3) (key7) (key7) | | | | v v v v state0' state1' state2' state3' ``` -AesGenerator4R uses the following 4 round keys: +AesGenerator4R uses the following 8 round keys: ``` -key0 = 5d 46 90 f8 a6 e4 fb 7f b7 82 1f 14 95 9e 35 cf -key1 = 50 c4 55 6a 8a 27 e8 fe c3 5a 5c bd dc ff 41 67 -key2 = a4 47 4c 11 e4 fd 24 d5 d2 9a 27 a7 ac 4a 32 3d -key3 = 2a 3a 0c 81 ff ae a9 99 d9 db d3 42 08 db f6 76 +key0 = dd aa 21 64 db 3d 83 d1 2b 6d 54 2f 3f d2 e5 99 +key1 = 50 34 0e b2 55 3f 91 b6 53 9d f7 06 e5 cd df a5 +key2 = 04 d9 3e 5c af 7b 5e 51 9f 67 a4 0a bf 02 1c 17 +key3 = 63 37 62 85 08 5d 8f e7 85 37 67 cd 91 d2 de d8 +key4 = 73 6f 82 b5 a6 a7 d6 e3 6d 8b 51 3d b4 ff 9e 22 +key5 = f3 6b 56 c7 d9 b3 10 9c 4e 4d 02 e9 d2 b7 72 b2 +key6 = e7 c9 73 f2 8b a3 65 f7 0a 66 a9 2b a7 ef 3b f6 +key7 = 09 d6 7c 7a de 39 58 91 fd d1 06 0c 2d 76 b0 c0 ``` These keys were generated as: ``` -key0, key1, key2, key3 = Hash512("RandomX AesGenerator4R keys") +key0, key1, key2, key3 = Hash512("RandomX AesGenerator4R keys 0-3") +key4, key5, key6, key7 = Hash512("RandomX AesGenerator4R keys 4-7") ``` ### 3.4 AesHash1R diff --git a/src/aes_hash.cpp b/src/aes_hash.cpp index c1239aa..a5c0797 100644 --- a/src/aes_hash.cpp +++ b/src/aes_hash.cpp @@ -157,10 +157,14 @@ void fillAes1Rx4(void *state, size_t outputSize, void *buffer) { template void fillAes1Rx4(void *state, size_t outputSize, void *buffer); template void fillAes1Rx4(void *state, size_t outputSize, void *buffer); -#define AES_GEN_4R_KEY0 0xcf359e95, 0x141f82b7, 0x7ffbe4a6, 0xf890465d -#define AES_GEN_4R_KEY1 0x6741ffdc, 0xbd5c5ac3, 0xfee8278a, 0x6a55c450 -#define AES_GEN_4R_KEY2 0x3d324aac, 0xa7279ad2, 0xd524fde4, 0x114c47a4 -#define AES_GEN_4R_KEY3 0x76f6db08, 0x42d3dbd9, 0x99a9aeff, 0x810c3a2a +#define AES_GEN_4R_KEY0 0x99e5d23f, 0x2f546d2b, 0xd1833ddb, 0x6421aadd +#define AES_GEN_4R_KEY1 0xa5dfcde5, 0x06f79d53, 0xb6913f55, 0xb20e3450 +#define AES_GEN_4R_KEY2 0x171c02bf, 0x0aa4679f, 0x515e7baf, 0x5c3ed904 +#define AES_GEN_4R_KEY3 0xd8ded291, 0xcd673785, 0xe78f5d08, 0x85623763 +#define AES_GEN_4R_KEY4 0x229effb4, 0x3d518b6d, 0xe3d6a7a6, 0xb5826f73 +#define AES_GEN_4R_KEY5 0xb272b7d2, 0xe9024d4e, 0x9c10b3d9, 0xc7566bf3 +#define AES_GEN_4R_KEY6 0xf63befa7, 0x2ba9660a, 0xf765a38b, 0xf273c9e7 +#define AES_GEN_4R_KEY7 0xc0b0762d, 0x0c06d1fd, 0x915839de, 0x7a7cd609 template void fillAes4Rx4(void *state, size_t outputSize, void *buffer) { @@ -168,12 +172,16 @@ void fillAes4Rx4(void *state, size_t outputSize, void *buffer) { const uint8_t* outputEnd = outptr + outputSize; rx_vec_i128 state0, state1, state2, state3; - rx_vec_i128 key0, key1, key2, key3; + rx_vec_i128 key0, key1, key2, key3, key4, key5, key6, key7; key0 = rx_set_int_vec_i128(AES_GEN_4R_KEY0); key1 = rx_set_int_vec_i128(AES_GEN_4R_KEY1); key2 = rx_set_int_vec_i128(AES_GEN_4R_KEY2); key3 = rx_set_int_vec_i128(AES_GEN_4R_KEY3); + key4 = rx_set_int_vec_i128(AES_GEN_4R_KEY4); + key5 = rx_set_int_vec_i128(AES_GEN_4R_KEY5); + key6 = rx_set_int_vec_i128(AES_GEN_4R_KEY6); + key7 = rx_set_int_vec_i128(AES_GEN_4R_KEY7); state0 = rx_load_vec_i128((rx_vec_i128*)state + 0); state1 = rx_load_vec_i128((rx_vec_i128*)state + 1); @@ -183,23 +191,23 @@ void fillAes4Rx4(void *state, size_t outputSize, void *buffer) { while (outptr < outputEnd) { state0 = aesdec(state0, key0); state1 = aesenc(state1, key0); - state2 = aesdec(state2, key0); - state3 = aesenc(state3, key0); + state2 = aesdec(state2, key4); + state3 = aesenc(state3, key4); state0 = aesdec(state0, key1); state1 = aesenc(state1, key1); - state2 = aesdec(state2, key1); - state3 = aesenc(state3, key1); + state2 = aesdec(state2, key5); + state3 = aesenc(state3, key5); state0 = aesdec(state0, key2); state1 = aesenc(state1, key2); - state2 = aesdec(state2, key2); - state3 = aesenc(state3, key2); + state2 = aesdec(state2, key6); + state3 = aesenc(state3, key6); state0 = aesdec(state0, key3); state1 = aesenc(state1, key3); - state2 = aesdec(state2, key3); - state3 = aesenc(state3, key3); + state2 = aesdec(state2, key7); + state3 = aesenc(state3, key7); rx_store_vec_i128((rx_vec_i128*)outptr + 0, state0); rx_store_vec_i128((rx_vec_i128*)outptr + 1, state1); diff --git a/src/tests/benchmark.cpp b/src/tests/benchmark.cpp index 7343713..2d4f31e 100644 --- a/src/tests/benchmark.cpp +++ b/src/tests/benchmark.cpp @@ -241,7 +241,7 @@ int main(int argc, char** argv) { std::cout << "Calculated result: "; result.print(std::cout); if (noncesCount == 1000 && seedValue == 0) - std::cout << "Reference result: 669ae4f2e5e2c0d9cc232ff2c37d41ae113fa302bbf983d9f3342879831b4edf" << std::endl; + std::cout << "Reference result: a925d346195ef38048e714709e0b24a88fef565fa02fa97127e00fac08ee6eb8" << std::endl; if (!miningMode) { std::cout << "Performance: " << 1000 * elapsed / noncesCount << " ms per hash" << std::endl; } diff --git a/src/tests/rng-tests.cpp b/src/tests/rng-tests.cpp new file mode 100644 index 0000000..fed4761 --- /dev/null +++ b/src/tests/rng-tests.cpp @@ -0,0 +1,93 @@ +/* + cd ~ + wget http://simul.iro.umontreal.ca/testu01/TestU01.zip + unzip TestU01.zip + mkdir TestU01 + cd TestU01-1.2.3 + ./configure --prefix=`pwd`/../TestU01 + make -j8 + make install + cd ~/RandomX + g++ -O3 src/tests/rng-tests.cpp -lm -I ~/TestU01/include -L ~/TestU01/lib -L bin/ -l:libtestu01.a -l:libmylib.a -l:libprobdist.a -lrandomx -o bin/rng-tests -DRANDOMX_GEN=4R -DRANDOMX_TESTU01=Crush + bin/rng-tests 0 +*/ + +extern "C" { + #include "unif01.h" + #include "bbattery.h" +} + +#include "../aes_hash.hpp" +#include "../blake2/blake2.h" +#include "utility.hpp" +#include + +#ifndef RANDOMX_GEN +#error Please define RANDOMX_GEN with a value of 1R or 4R +#endif + +#ifndef RANDOMX_TESTU01 +#error Please define RANDOMX_TESTU01 with a value of SmallCrush, Crush or BigCrush +#endif + +#define STR(x) #x +#define CONCAT(a,b,c) a ## b ## c +#define GEN_NAME(x) "AesGenerator" STR(x) +#define GEN_FUNC(x) CONCAT(fillAes, x, x4) +#define TEST_SUITE(x) CONCAT(bbattery_, x,) + +constexpr int GeneratorStateSize = 64; +constexpr int GeneratorCapacity = GeneratorStateSize / sizeof(uint32_t); + +static unsigned long aesGenBits(void *param, void *state) { + uint32_t* statePtr = (uint32_t*)state; + int* indexPtr = (int*)param; + int stateIndex = *indexPtr; + if(stateIndex >= GeneratorCapacity) { + GEN_FUNC(RANDOMX_GEN)(statePtr, GeneratorStateSize, statePtr); + stateIndex = 0; + } + uint32_t next = statePtr[stateIndex]; + *indexPtr = stateIndex + 1; + return next; +} + +static double aesGenDouble(void *param, void *state) { + return aesGenBits (param, state) / unif01_NORM32; +} + +static void aesWriteState(void* state) { + char* statePtr = (char*)state; + for(int i = 0; i < 4; ++i) { + std::cout << "state" << i << " = "; + outputHex(std::cout, statePtr + (i * 16), 16); + std::cout << std::endl; + } +} + +int main(int argc, char** argv) { + if (argc != 2) { + std::cout << argv[0] << " " << std::endl; + return 1; + } + uint32_t state[GeneratorCapacity] = { 0 }; + int stateIndex = GeneratorCapacity; + char name[] = GEN_NAME(RANDOMX_GEN); + uint64_t seed = strtoull(argv[1], nullptr, 0); + if(seed) { + blake2b(&state, sizeof(state), &seed, sizeof(seed), nullptr, 0); + } + unif01_Gen gen; + gen.state = &state; + gen.param = &stateIndex; + gen.Write = &aesWriteState; + gen.GetU01 = &aesGenDouble; + gen.GetBits = &aesGenBits; + gen.name = (char*)name; + + gen.Write(gen.state); + std::cout << std::endl; + + TEST_SUITE(RANDOMX_TESTU01)(&gen); + return 0; +} \ No newline at end of file