Added DRAM buffer option to rx2c

2024-08-15 00:23:14 +00:00 · 2018-11-11 13:05:34 +01:00 · 2018-11-11 13:05:34 +01:00 · 8f3b145fe6
commit 8f3b145fe6
parent 2ea440d0f5
2 changed files with 35 additions and 19 deletions
--- a/README.md
+++ b/README.md
@ -18,10 +18,10 @@ The VM has access to 4 GiB of external memory in read-only mode. The DRAM memory
 *The DRAM blob can be generated in 0.1-0.3 seconds using 8 threads with hardware-accelerated AES and dual channel DDR3 or DDR4 memory. Dual channel DDR4 memory has enough bandwidth to support up to 16 mining threads.*

 #### MMU
-The memory management unit (MMU) interfaces the CPU with the DRAM blob. The purpose of the MMU is to translate the random memory accesses generated by the random program into a DRAM-friendly access pattern, where memory reads are not bound by access latency. The MMU accepts a 32-bit address `addr` and outputs a 64-bit value from DRAM. The MMU splits the 4 GiB DRAM blob into 256-byte blocks. Data within one block is always read sequentially in 32 reads (32×8 bytes). When the current block has been consumed, reading jumps to a random block. The address of the next block is calculated 8 reads before the current block is exhausted to enable efficient prefetching. The MMU uses three internal registers:
+The memory management unit (MMU) interfaces the CPU with the DRAM blob. The purpose of the MMU is to translate the random memory accesses generated by the random program into a DRAM-friendly access pattern, where memory reads are not bound by access latency. The MMU accepts a 32-bit address `addr` and outputs a 64-bit value from DRAM. The MMU splits the 4 GiB DRAM blob into 256-byte blocks. Data within one block is always read sequentially in 32 reads (32×8 bytes). When the current block has been consumed, reading jumps to a random block. The address of the next block is calculated 16 reads before the current block is exhausted to enable efficient prefetching. The MMU uses three internal registers:
 * **m0** - Address of the next quadword to be read from memory (32-bit, 8-byte aligned).
 * **m1** - Address of the next block to be read from memory (32-bit, 256-byte aligned).
-* **mx** - Random 32-bit counter that determines the address of the next block. After each read, the read address is mixed with the counter: `mx ^= addr`. When the 24th quadword of the current block is read (the value of the `m0` register ends with `0xC0`), the value of the `mx` register is copied into register `m1` and the last 8 bits of `m1` are cleared.
+* **mx** - Random 32-bit counter that determines the address of the next block. After each read, the read address is mixed with the counter: `mx ^= addr`. When the 16th quadword of the current block is read (the value of the `m0` register ends with `0x80`), the value of the `mx` register is copied into register `m1` and the last 8 bits of `m1` are cleared.

 *When the value of the `m1` register is changed, the memory location can be preloaded into CPU cache using the x86 `PREFETCH` instruction or ARM `PRFM` instruction. Implicit prefetch should ensure that sequentially accessed memory is already in the cache.*

@ -169,7 +169,7 @@ A 32-bit address mask that is used to calculate the write address for the C oper
 |147-157|ROR_64|no|64|6|A >>> B|64|

 ##### 32-bit operations
-Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations are 32 bits long and bits 32-63 of C are zero.
+Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are zero.

 ##### Multiplication
 There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result.
@ -246,7 +246,7 @@ The RET instruction behaves like "not taken" when the stack is empty. Taken RET
 The program is initialized from a 256-bit seed value `S`.
 1. A [pcg32](http://www.pcg-random.org/)  random number generator is initialized with state `S[63:0]`.
 2. The generator is used to generate random 128 bytes `R1`.
-3. Integer registers `r0`-`r7` are initialized using bytes 0-63 bytes of `R1`.
+3. Integer registers `r0`-`r7` are initialized using bytes 0-63 of `R1`.
 4. Floating point registers `f0`-`f7` are initialized using bytes 64-127 of `R1` interpreted as 8 64-bit signed integers converted to a double precision floating point format.
 5. The initial value of the `m0` register is set to `S[95:64]` and the the last 8 bits are cleared (256-byte aligned).
 6. `S` is expanded into 10 AES round keys `K0`-`K9`.