mirror of
				https://git.wownero.com/wownero/RandomWOW.git
				synced 2024-08-15 00:23:14 +00:00 
			
		
		
		
	Reworked instruction set documentation
This commit is contained in:
		
							parent
							
								
									d1a808643d
								
							
						
					
					
						commit
						6941b2cb69
					
				
					 2 changed files with 294 additions and 197 deletions
				
			
		
							
								
								
									
										129
									
								
								doc/isa-ops.md
									
										
									
									
									
										Normal file
									
								
							
							
						
						
									
										129
									
								
								doc/isa-ops.md
									
										
									
									
									
										Normal file
									
								
							|  | @ -0,0 +1,129 @@ | |||
| 
 | ||||
| # RandomX instruction listing | ||||
| There are 31 unique instructions divided into 3 groups: | ||||
| 
 | ||||
| |group|# operations|# opcodes|| | ||||
| |---------|-----------------|----|-| | ||||
| |integer (IA)|22|144|56.3%| | ||||
| |floating point (FP)|5|76|29.7%| | ||||
| |control (CL)|4|36|14.0% | ||||
| ||**31**|**256**|**100%** | ||||
| 
 | ||||
| 
 | ||||
| ## Integer instructions | ||||
| There are 22 integer instructions. They are divided into 3 classes (MATH, DIV, SHIFT) with different B operand selection rules. | ||||
| |# opcodes|instruction|class|signed|A width|B width|C|C width| | ||||
| |-|-|-|-|-|-|-|-| | ||||
| |12|ADD_64|MATH|no|64|64|`A + B`|64| | ||||
| |2|ADD_32|MATH|no|32|32|`A + B`|32| | ||||
| |12|SUB_64|MATH|no|64|64|`A - B`|64| | ||||
| |2|SUB_32|MATH|no|32|32|`A - B`|32| | ||||
| |21|MUL_64|MATH|no|64|64|`A * B`|64| | ||||
| |10|MULH_64|MATH|no|64|64|`A * B`|64| | ||||
| |15|MUL_32|MATH|no|32|32|`A * B`|64| | ||||
| |15|IMUL_32|MATH|yes|32|32|`A * B`|64| | ||||
| |10|IMULH_64|MATH|yes|64|64|`A * B`|64| | ||||
| |4|DIV_64|DIV|no|64|32|`A / B`|64| | ||||
| |4|IDIV_64|DIV|yes|64|32|`A / B`|64| | ||||
| |4|AND_64|MATH|no|64|64|`A & B`|64| | ||||
| |2|AND_32|MATH|no|32|32|`A & B`|32| | ||||
| |4|OR_64|MATH|no|64|64|`A | B`|64| | ||||
| |2|OR_32|MATH|no|32|32|`A | B`|32| | ||||
| |4|XOR_64|MATH|no|64|64|`A ^ B`|64| | ||||
| |2|XOR_32|MATH|no|32|32|`A ^ B`|32| | ||||
| |3|SHL_64|SHIFT|no|64|6|`A << B`|64| | ||||
| |3|SHR_64|SHIFT|no|64|6|`A >> B`|64| | ||||
| |3|SAR_64|SHIFT|yes|64|6|`A >> B`|64| | ||||
| |6|ROL_64|SHIFT|no|64|6|`A <<< B`|64| | ||||
| |6|ROR_64|SHIFT|no|64|6|`A >>> B`|64| | ||||
| 
 | ||||
| #### 32-bit operations | ||||
| Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are set to zero. | ||||
| 
 | ||||
| #### Multiplication | ||||
| There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result. | ||||
| 
 | ||||
| #### Division | ||||
| For the division instructions, the dividend is 64 bits long and the divisor 32 bits long. The IDIV_64 instruction interprets both operands as signed integers. In case of division by zero or signed overflow, the result is equal to the dividend `A`. | ||||
| 
 | ||||
| 75% of division instructions use a runtime-constant divisor and can be optimized using a multiplication and shifts. | ||||
| 
 | ||||
| #### Shift and rotate | ||||
| The shift/rotate instructions use just the bottom 6 bits of the `B` operand (`imm8` is used as the immediate value). All treat `A` as unsigned except SAR_64, which performs an arithmetic right shift by copying the sign bit. | ||||
| 
 | ||||
| ## Floating point instructions | ||||
| There are 5 floating point instructions. All floating point instructions are vector instructions that operate on two packed double precision floating point values. | ||||
| 
 | ||||
| |# opcodes|instruction|C| | ||||
| |-|-|-|-| | ||||
| |20|FPADD|`A + B`| | ||||
| |20|FPSUB|`A - B`| | ||||
| |22|FPMUL|`A * B`| | ||||
| |8|FPDIV|`A / B`| | ||||
| |6|FPSQRT|`sqrt(abs(A))`| | ||||
| 
 | ||||
| #### Conversion of operand A | ||||
| Operand A is loaded from memory as a 64-bit value. All floating point instructions interpret A as two packed 32-bit signed integers and convert them into two packed double precision floating point values. | ||||
| 
 | ||||
| #### Rounding | ||||
| FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` control instruction. Denormal values must be always flushed to zero. | ||||
| 
 | ||||
| #### NaN | ||||
| If an operation produces NaN, the result is converted into positive zero. NaN results may never be written into registers or memory. Only division and multiplication must be checked for NaN results (`0.0 / 0.0` and `0.0 * Infinity` result in NaN). | ||||
| 
 | ||||
| ## Control instructions | ||||
| There are 4 control instructions. | ||||
| 
 | ||||
| |# opcodes|instruction|description|condition| | ||||
| |-|-|-|-| | ||||
| |2|FPROUND|change floating point rounding mode|- | ||||
| |11|JUMP|conditional jump|(see condition table below) | ||||
| |11|CALL|conditional procedure call|(see condition table below) | ||||
| |12|RET|return from procedure|stack is not empty | ||||
| 
 | ||||
| All control instructions behave as 'arithmetic no-op' and simply copy the input operand A into the destination C. | ||||
| 
 | ||||
| The JUMP and CALL instructions use a condition function, which takes the lower 32 bits of operand B (register) and the value `imm32` and evaluates a condition based on the `B.LOC.C` flag:  | ||||
| 
 | ||||
| |`B.LOC.C`|signed|jump condition|probability|*x86*|*ARM* | ||||
| |---|---|----------|-----|--|----| | ||||
| |0|no|`B <= imm32`|0% - 100%|`JBE`|`BLS` | ||||
| |1|no|`B > imm32`|0% - 100%|`JA`|`BHI` | ||||
| |2|yes|`B - imm32 < 0`|50%|`JS`|`BMI` | ||||
| |3|yes|`B - imm32 >= 0`|50%|`JNS`|`BPL` | ||||
| |4|yes|`B - imm32` overflows|0% - 50%|`JO`|`BVS` | ||||
| |5|yes|`B - imm32` doesn't overflow|50% - 100%|`JNO`|`BVC` | ||||
| |6|yes|`B < imm32`|0% - 100%|`JL`|`BLT` | ||||
| |7|yes|`B >= imm32`|0% - 100%|`JGE`|`BGE` | ||||
| 
 | ||||
| The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected jump probability (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).* | ||||
| 
 | ||||
| ### FPROUND | ||||
| The FPROUND instruction changes the rounding mode for all subsequent FPU operations depending on a two-bit flag. The flag is calculated by rotating A `imm8` bits to the right and taking the two least-significant bits: | ||||
| 
 | ||||
| ``` | ||||
| rounding flag = (A >>> imm8)[1:0] | ||||
| ``` | ||||
| 
 | ||||
| |rounding flag|rounding mode| | ||||
| |-------|------------| | ||||
| |00|roundTiesToEven| | ||||
| |01|roundTowardNegative| | ||||
| |10|roundTowardPositive| | ||||
| |11|roundTowardZero| | ||||
| 
 | ||||
| The rounding modes are defined by the IEEE-754 standard. | ||||
| 
 | ||||
| *The two-bit flag value exactly corresponds to bits 13-14 of the x86 `MXCSR` register and bits 23 and 22 (reversed) of the ARM `FPSCR` register.* | ||||
| 
 | ||||
| ### JUMP | ||||
| If the jump condition is `true`, the JUMP instruction performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)` bytes (1-128 instructions forward). | ||||
| 
 | ||||
| ### CALL | ||||
| If the jump condition is `true`, the CALL instruction pushes the value of `pc` (program counter) onto the stack and then performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)` bytes (1-128 instructions forward). | ||||
| 
 | ||||
| ### RET | ||||
| If the stack is not empty, the RET instruction pops the return address from the stack (it's the instruction following the previous CALL) and jumps to it. | ||||
| 
 | ||||
| ## Reference implementation | ||||
| A portable C++ implementation of all integer and floating point instructions is available in [instructionsPortable.cpp](../src/instructionsPortable.cpp). | ||||
							
								
								
									
										362
									
								
								doc/isa.md
									
										
									
									
									
								
							
							
						
						
									
										362
									
								
								doc/isa.md
									
										
									
									
									
								
							|  | @ -1,213 +1,181 @@ | |||
| # RandomX instruction encoding | ||||
| The instruction set was designed in such way that any random 16-byte word is a valid instruction and any sequence of valid instructions is a valid program. There are no syntax rules. | ||||
| 
 | ||||
| ## RandomX instruction set | ||||
| RandomX uses a simple low-level language (instruction set), which was designed so that any random bitstring forms a valid program. | ||||
| The encoding of each 128-bit instruction word is following: | ||||
| 
 | ||||
| Each RandomX instruction has a length of 128 bits. The encoding is following: | ||||
|  | ||||
| 
 | ||||
|  | ||||
| ## opcode | ||||
| There are 256 opcodes, which are distributed between 3 groups of instructions. There are 31 distinct operations (each operation can be encoded using multiple opcodes - for example opcodes `0x00` to `0x0d` correspond to integer addition). | ||||
| 
 | ||||
| *All flags are aligned to an 8-bit boundary for easier decoding.* | ||||
| **Table 1: Instruction groups** | ||||
| |group|# operations|# opcodes|| | ||||
| |---------|-----------------|----|-| | ||||
| |integer (IA)|22|144|56.3%| | ||||
| |floating point (FP)|5|76|29.7%| | ||||
| |control (CL)|4|36|14.0% | ||||
| ||**31**|**256**|**100%** | ||||
| 
 | ||||
| #### Opcode | ||||
| There are 256 opcodes, which are distributed between 30 instructions based on their weight (how often they will occur in the program on average). Instructions are divided into 5 groups: | ||||
| Full description of all instructions: [isa-ops.md](isa-ops.md). | ||||
| 
 | ||||
| |group|number of opcodes||comment| | ||||
| |---------|-----------------|----|------| | ||||
| |IA|115|44.9%|integer arithmetic operations | ||||
| |IS|21|8.2%|bitwise shift and rotate | ||||
| |FA|70|27.4%|floating point arithmetic operations | ||||
| |FS|8|3.1%|floating point single-input operations | ||||
| |CF|42|16.4%|control flow instructions (branches) | ||||
| ||**256**|**100%** | ||||
| ## A.LOC | ||||
| **Table 2: `A.LOC` encoding** | ||||
| 
 | ||||
| #### Operand A | ||||
| The first 64-bit operand is read from memory. The location is determined by the `loc(a)` flag: | ||||
| |bits|description| | ||||
| |----|--------| | ||||
| |0-1|`A.LOC.W` flag| | ||||
| |2-5|Reserved| | ||||
| |6-7|`A.LOC.X` flag| | ||||
| 
 | ||||
| |loc(a)[2:0]|read A from|address size (W) | ||||
| The `A.LOC.W` flag determines the address width when reading operand A from the scratchpad: | ||||
| 
 | ||||
| **Table 3: Operand A read address width** | ||||
| 
 | ||||
| |`A.LOC.W`|address width (W) | ||||
| |---------|-|-| | ||||
| |000|dataset|32 bits| | ||||
| |001|dataset|32 bits| | ||||
| |010|dataset|32 bits| | ||||
| |011|dataset|32 bits| | ||||
| |100|scratchpad|15 bits| | ||||
| |101|scratchpad|11 bits| | ||||
| |110|scratchpad|11 bits| | ||||
| |111|scratchpad|11 bits| | ||||
| |0|15 bits (256 KiB)| | ||||
| |1-3|11 bits (16 KiB)| | ||||
| 
 | ||||
| Flag `reg(a)` encodes an integer register `r0`-`r7`.  The read address is calculated as: | ||||
| If the `A.LOC.W` flag is zero, the address space covers the whole 256 KiB scratchpad. Otherwise, just the first 16 KiB of the scratchpad are addressed. | ||||
| 
 | ||||
| If the `A.LOC.X` flag is zero, the instruction mixes the scratchpad read address into the `mx` register using XOR. This mixing happens before the address is truncated to W bits (see pseudocode below). | ||||
| 
 | ||||
| ## A.REG | ||||
| **Table 4: `A.REG` encoding** | ||||
| 
 | ||||
| |bits|description| | ||||
| |----|--------| | ||||
| |0-2|`A.REG.R` flag| | ||||
| |3-7|Reserved| | ||||
| 
 | ||||
| The `A.REG.R` flag encodes "readAddressRegister", which is an integer register  `r0`-`r7` to be used for scratchpad read address generation. Read address is generated as follows (pseudocode): | ||||
| 
 | ||||
| ```python | ||||
| readAddressRegister = IntegerRegister(A.REG.R) | ||||
| readAddressRegister = readAddressRegister XOR SignExtend(A.mask32) | ||||
| readAddress = readAddressRegister[31:0] | ||||
| # dataset is read if the ic register is divisible by 64 | ||||
| IF ic mod 64 == 0: | ||||
|   DatasetRead(readAddress) | ||||
| # optional mixing into the mx register | ||||
| IF A.LOC.X == 0: | ||||
|   mx = mx XOR readAddress | ||||
| # truncate to W bits | ||||
| W = GetAddressWidth(A.LOC.W) | ||||
| readAddress = readAddress[W-1:0] | ||||
| ``` | ||||
| reg(a) = reg(a) XOR signExtend(addr(a)) | ||||
| read_addr = reg(a)[W-1:0] | ||||
| 
 | ||||
| Note that the value of the read address register is modified during address generation. | ||||
| 
 | ||||
| ## B.LOC | ||||
| **Table 5: `B.LOC` encoding** | ||||
| 
 | ||||
| |bits|description| | ||||
| |----|--------| | ||||
| |0-1|`B.LOC.L` flag| | ||||
| |0-2|`B.LOC.C` flag| | ||||
| |3-7|Reserved| | ||||
| 
 | ||||
| The `B.LOC.L` flag determines the B operand. It can be either a register or immediate value. | ||||
| 
 | ||||
| **Table 6: Operand B** | ||||
| 
 | ||||
| |`B.LOC.L`|IA/DIV|IA/SHIFT|IA/MATH|FP|CL| | ||||
| |----|--------|----|------|----|---| | ||||
| |0|register|register|register|register|register| | ||||
| |1|`imm32`|register|register|register|register| | ||||
| |2|`imm32`|`imm8`|register|register|register| | ||||
| |3|`imm32`|`imm8`|`imm32`|register|register| | ||||
| 
 | ||||
| Integer instructions are split into 3 classes: integer division (IA/DIV), shift and rotate (IA/SHIFT) and other (IA/MATH). Floating point (FP) and control (CL) instructions always use a register operand. | ||||
| 
 | ||||
| Register to be used as operand B is encoded in the `B.REG.R` flag (see below). | ||||
| 
 | ||||
| The `B.LOC.C` flag determines the condition for the JUMP and CALL instructions. The flag partially overlaps with the `B.LOC.L` flag. | ||||
| 
 | ||||
| ## B.REG | ||||
| **Table 7: `B.REG` encoding** | ||||
| 
 | ||||
| |bits|description| | ||||
| |----|--------| | ||||
| |0-2|`B.REG.R` flag| | ||||
| |3-7|Reserved| | ||||
| 
 | ||||
| Register encoded by the `B.REG.R` depends on the instruction group: | ||||
| 
 | ||||
| **Table 8: Register operands by group** | ||||
| 
 | ||||
| |group|registers| | ||||
| |----|--------| | ||||
| |IA|`r0`-`r7`| | ||||
| |FP|`f0`-`f7`| | ||||
| |CL|`r0`-`r7`| | ||||
| 
 | ||||
| ##  C.LOC | ||||
| **Table 9: `C.LOC` encoding** | ||||
| 
 | ||||
| |bits|description| | ||||
| |----|--------| | ||||
| |0-1|`C.LOC.W` flag| | ||||
| |2|`C.LOC.R` flag| | ||||
| |3-6|Reserved| | ||||
| |7|`C.LOC.H` flag| | ||||
| 
 | ||||
| The `C.LOC.W` flag determines the address width when writing operand C to the scratchpad: | ||||
| 
 | ||||
| **Table 10: Operand C write address width** | ||||
| 
 | ||||
| |`C.LOC.W`|address width (W) | ||||
| |---------|-|-| | ||||
| |0|15 bits (256 KiB)| | ||||
| |1-3|11 bits (16 KiB)| | ||||
| 
 | ||||
| If the `C.LOC.W` flag is zero, the address space covers the whole 256 KiB scratchpad. Otherwise, just the first 16 KiB of the scratchpad are addressed. | ||||
| 
 | ||||
| The `C.LOC.R` determines the destination where operand C is written: | ||||
| 
 | ||||
| **Table 11: Operand C destination** | ||||
| 
 | ||||
| |`C.LOC.R`|groups IA, CL|group FP | ||||
| |---------|-|-| | ||||
| |0|scratchpad|register | ||||
| |1|register|register + scratchpad | ||||
| 
 | ||||
| Integer and control instructions (groups IA and CL) write either to the scratchpad or to a register. Floating point instructions always write to a register and can also write to the scratchpad. In that case, flag `C.LOC.H` determines if the low or high half of the register is written: | ||||
| 
 | ||||
| **Table 12: Floating point register write** | ||||
| 
 | ||||
| |`C.LOC.H`|write bits| | ||||
| |---------|----------| | ||||
| |0|0-63| | ||||
| |1|64-127| | ||||
| 
 | ||||
| ## C.REG | ||||
| **Table 13: `C.REG` encoding** | ||||
| 
 | ||||
| |bits|description| | ||||
| |----|--------| | ||||
| |0-2|`C.REG.R` flag| | ||||
| |3-7|Reserved| | ||||
| 
 | ||||
| The destination register encoded in the `C.REG.R` flag encodes both the write address register (if writing to the scratchpad) and the destination register (if writing to a register). The destination register depends on the instruction group (see Table 8). Write address is always generated from an integer register: | ||||
| 
 | ||||
| ```python | ||||
| writeAddressRegister = IntegerRegister(C.REG.R) | ||||
| writeAddress = writeAddressRegister[31:0] XOR C.mask32 | ||||
| # truncate to W bits | ||||
| W = GetAddressWidth(C.LOC.W) | ||||
| writeAddress = writeAddress [W-1:0] | ||||
| ``` | ||||
| `W` is the address width from the above table. For reading from the scratchpad, `read_addr` is multiplied by 8 for 8-byte aligned access. | ||||
| 
 | ||||
| #### Operand B | ||||
| The second operand is loaded either from a register or from an immediate value encoded within the instruction. The `reg(b)` flag encodes an integer register (instruction groups IA and IS) or a floating point register (instruction group FA). Instruction group FS doesn't use operand B. | ||||
| ## imm8 | ||||
| `imm8` is an 8-bit immediate value that is used as the B operand by IA/SHIFT instructions (see Table 6). Additionally, it's used by some control instructions. | ||||
| 
 | ||||
| |loc(b)[2:0]|B (IA)|B (IS)|B (FA)|B (FS) | ||||
| |---------|-|-|-|-| | ||||
| |000|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|- | ||||
| |001|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|- | ||||
| |010|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|- | ||||
| |011|integer `reg(b)`|integer `reg(b)`|floating point `reg(b)`|- | ||||
| |100|integer `reg(b)`|`imm8`|floating point `reg(b)`|- | ||||
| |101|integer `reg(b)`|`imm8`|floating point `reg(b)`|- | ||||
| |110|`imm32`|`imm8`|floating point `reg(b)`|- | ||||
| |111|`imm32`|`imm8`|floating point `reg(b)`|- | ||||
| ## A.mask32 | ||||
| `A.mask32` is a 32-bit address mask that is used to calculate the read address for the A operand. It's sign-extended to 64 bits before use. | ||||
| 
 | ||||
| `imm8` is an 8-bit immediate value, which is used for shift and rotate integer instructions (group IS). Only bits 0-5 are used. | ||||
| ## imm32 | ||||
| `imm32` is a 32-bit immediate value which is used for integer instructions from groups IA/DIV and IA/OTHER (see Table 6). The immediate value is sign-extended for instructions that expect 64-bit operands. | ||||
| 
 | ||||
| `imm32` is a 32-bit immediate value which is used for integer instructions from group IA. | ||||
| 
 | ||||
| Floating point instructions don't use immediate values. | ||||
| 
 | ||||
| #### Operand C | ||||
| The third operand is the location where the result is stored. It can be a register or a 64-bit scratchpad location, depending on the value of flag `loc(c)`. | ||||
| 
 | ||||
| |loc\(c\)[2:0]|address size (W)| C (IA, IS)|C (FA, FS) | ||||
| |---------|-|-|-|-|-| | ||||
| |000|15 bits|scratchpad|floating point `reg(c)` | ||||
| |001|11 bits|scratchpad|floating point `reg(c)` | ||||
| |010|11 bits|scratchpad|floating point `reg(c)` | ||||
| |011|11 bits|scratchpad|floating point `reg(c)` | ||||
| |100|15 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad | ||||
| |101|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad | ||||
| |110|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad | ||||
| |111|11 bits|integer `reg(c)`|floating point `reg(c)`, scratchpad | ||||
| 
 | ||||
| Integer operations write either to the scratchpad or to a register. Floating point operations always write to a register and can also write to the scratchpad. In that case, bit 3 of the `loc(c)` flag determines if the low or high half of the register is written: | ||||
| 
 | ||||
| |loc\(c\)[3]|write to scratchpad| | ||||
| |------------|-----------------------| | ||||
| |0|floating point `reg(c)[63:0]` | ||||
| |1|floating point `reg(c)[127:64]` | ||||
| 
 | ||||
| The FPROUND instruction is an exception and always writes the low half of the register. | ||||
| 
 | ||||
| For writing to the scratchpad, an integer register is always used to calculate the address: | ||||
| ``` | ||||
| write_addr = 8 * (addr(c) XOR reg(c)[31:0])[W-1:0] | ||||
| ``` | ||||
| *CPUs are typically designed for a 2:1 load:store ratio, so each VM instruction performs on average 1 memory read and 0.5 writes to memory.* | ||||
| 
 | ||||
| #### imm8 | ||||
| An 8-bit immediate value that is used as the shift/rotate count by group IS instructions and as the jump offset of the CALL instruction. | ||||
| 
 | ||||
| #### addr(a) | ||||
| A 32-bit address mask that is used to calculate the read address for the A operand. It's sign-extended to 64 bits. | ||||
| 
 | ||||
| #### addr\(c\) | ||||
| A 32-bit address mask that is used to calculate the write address for the C operand. `addr(c)` is equal to `imm32`. | ||||
| 
 | ||||
| ### ALU instructions | ||||
| 
 | ||||
| |weight|instruction|group|signed|A width|B width|C|C width| | ||||
| |-|-|-|-|-|-|-|-| | ||||
| |10|ADD_64|IA|no|64|64|`A + B`|64| | ||||
| |2|ADD_32|IA|no|32|32|`A + B`|32| | ||||
| |10|SUB_64|IA|no|64|64|`A - B`|64| | ||||
| |2|SUB_32|IA|no|32|32|`A - B`|32| | ||||
| |21|MUL_64|IA|no|64|64|`A * B`|64| | ||||
| |10|MULH_64|IA|no|64|64|`A * B`|64| | ||||
| |15|MUL_32|IA|no|32|32|`A * B`|64| | ||||
| |15|IMUL_32|IA|yes|32|32|`A * B`|64| | ||||
| |10|IMULH_64|IA|yes|64|64|`A * B`|64| | ||||
| |1|DIV_64|IA|no|64|32|`A / B`|32| | ||||
| |1|IDIV_64|IA|yes|64|32|`A / B`|32| | ||||
| |4|AND_64|IA|no|64|64|`A & B`|64| | ||||
| |2|AND_32|IA|no|32|32|`A & B`|32| | ||||
| |4|OR_64|IA|no|64|64|`A | B`|64| | ||||
| |2|OR_32|IA|no|32|32|`A | B`|32| | ||||
| |4|XOR_64|IA|no|64|64|`A ^ B`|64| | ||||
| |2|XOR_32|IA|no|32|32|`A ^ B`|32| | ||||
| |3|SHL_64|IS|no|64|6|`A << B`|64| | ||||
| |3|SHR_64|IS|no|64|6|`A >> B`|64| | ||||
| |3|SAR_64|IS|yes|64|6|`A >> B`|64| | ||||
| |6|ROL_64|IS|no|64|6|`A <<< B`|64| | ||||
| |6|ROR_64|IS|no|64|6|`A >>> B`|64| | ||||
| 
 | ||||
| ##### 32-bit operations | ||||
| Instructions ADD_32, SUB_32, AND_32, OR_32, XOR_32 only use the low-order 32 bits of the input operands. The result of these operations is 32 bits long and bits 32-63 of C are set to zero. | ||||
| 
 | ||||
| ##### Multiplication | ||||
| There are 5 different multiplication operations. MUL_64 and MULH_64 both take 64-bit unsigned operands, but MUL_64 produces the low 64 bits of the result and MULH_64 produces the high 64 bits. MUL_32 and IMUL_32 use only the low-order 32 bits of the operands and produce a 64-bit result. The signed variant interprets the arguments as signed integers. IMULH_64 takes two 64-bit signed operands and produces the high-order 64 bits of the result. | ||||
| 
 | ||||
| ##### Division | ||||
| For the division instructions, the dividend is 64 bits long and the divisor 32 bits long. The IDIV_64 instruction interprets both operands as signed integers. In case of division by zero or signed overflow, the result is equal to the dividend `A`. | ||||
| 
 | ||||
| *Division by zero can be handled without branching by a conditional move. Signed overflow happens only for the signed variant when the minimum negative value is divided by -1. This rare case must be handled in x86 (ARM produces the "correct" result).* | ||||
| 
 | ||||
| ##### Shift and rotate | ||||
| The shift/rotate instructions use just the bottom 6 bits of the `B` operand (`imm8` is used as the immediate value). All treat `A` as unsigned except SAR_64, which performs an arithmetic right shift by copying the sign bit. | ||||
| 
 | ||||
| ### FPU instructions | ||||
| 
 | ||||
| |weight|instruction|group|C| | ||||
| |-|-|-|-| | ||||
| |20|FPADD|FA|`A + B`| | ||||
| |20|FPSUB|FA|`A - B`| | ||||
| |22|FPMUL|FA|`A * B`| | ||||
| |8|FPDIV|FA|`A / B`| | ||||
| |6|FPSQRT|FS|`sqrt(abs(A))`| | ||||
| |2|FPROUND|FS|`convertSigned52(A)`| | ||||
| 
 | ||||
| All floating point instructions apart FPROUND are vector instructions that operate on two packed double precision floating point values. | ||||
| 
 | ||||
| #### Conversion of operand A | ||||
| Operand A is loaded from memory as a 64-bit value. All floating point instructions apart FPROUND interpret A as two packed 32-bit signed integers and convert them into two packed double precision floating point values. | ||||
| 
 | ||||
| The FPROUND instruction has a scalar output and interprets A as a 64-bit signed integer. The 11 least-significant bits are cleared before conversion to a double precision format. This is done so the number fits exactly into the 52-bit mantissa without rounding. Output of FPROUND is always written into the lower half of the result register and only this lower half may be written into the scratchpad. | ||||
| 
 | ||||
| #### Rounding | ||||
| FPU instructions conform to the IEEE-754 specification, so they must give correctly rounded results. Initial rounding mode is *roundTiesToEven*. Rounding mode can be changed by the `FPROUND` instruction. Denormal values must be flushed to zero. | ||||
| 
 | ||||
| #### NaN | ||||
| If an operation produces NaN, the result is converted into positive zero. NaN results may never be written into registers or memory. Only division and multiplication must be checked for NaN results (`0.0 / 0.0` and `0.0 * Infinity` result in NaN). | ||||
| 
 | ||||
| ##### FPROUND | ||||
| The FPROUND instruction changes the rounding mode for all subsequent FPU operations depending on the two least-significant bits of A. | ||||
| 
 | ||||
| |A[1:0]|rounding mode| | ||||
| |-------|------------| | ||||
| |00|roundTiesToEven| | ||||
| |01|roundTowardNegative| | ||||
| |10|roundTowardPositive| | ||||
| |11|roundTowardZero| | ||||
| 
 | ||||
| The rounding modes are defined by the IEEE-754 standard. | ||||
| 
 | ||||
| *The two-bit flag value exactly corresponds to bits 13-14 of the x86 `MXCSR` register and bits 23 and 22 (reversed) of the ARM `FPSCR` register.* | ||||
| 
 | ||||
| ### Control instructions | ||||
| The following 2 control instructions are supported: | ||||
| 
 | ||||
| |weight|instruction|function|condition| | ||||
| |-|-|-|-| | ||||
| |20|CALL|near procedure call|(see condition table below) | ||||
| |22|RET|return from procedure|stack is not empty | ||||
| 
 | ||||
| Both instructions are conditional. If the condition evaluates to `false`, CALL and RET behave as "arithmetic no-op" and simply copy operand A into destination C without jumping. | ||||
| 
 | ||||
| ##### CALL | ||||
| The CALL instruction uses a condition function, which takes the lower 32 bits of integer register `reg(b)` and the value `imm32` and evaluates a condition based on the `loc(b)` flag:  | ||||
| 
 | ||||
| |loc(b)[2:0]|signed|jump condition|probability|*x86*|*ARM* | ||||
| |---|---|----------|-----|--|----| | ||||
| |000|no|`reg(b)[31:0] <= imm32`|0% - 100%|`JBE`|`BLS` | ||||
| |001|no|`reg(b)[31:0] > imm32`|0% - 100%|`JA`|`BHI` | ||||
| |010|yes|`reg(b)[31:0] - imm32 < 0`|50%|`JS`|`BMI` | ||||
| |011|yes|`reg(b)[31:0] - imm32 >= 0`|50%|`JNS`|`BPL` | ||||
| |100|yes|`reg(b)[31:0] - imm32` overflows|0% - 50%|`JO`|`BVS` | ||||
| |101|yes|`reg(b)[31:0] - imm32` doesn't overflow|50% - 100%|`JNO`|`BVC` | ||||
| |110|yes|`reg(b)[31:0] < imm32`|0% - 100%|`JL`|`BLT` | ||||
| |111|yes|`reg(b)[31:0] >= imm32`|0% - 100%|`JGE`|`BGE` | ||||
| 
 | ||||
| The 'signed' column specifies if the operands are interpreted as signed or unsigned 32-bit numbers. Column 'probability' lists the expected jump probability (range means that the actual value for a specific instruction depends on `imm32`). *Columns 'x86' and 'ARM' list the corresponding hardware instructions (following a `CMP` instruction).* | ||||
| 
 | ||||
| Taken CALL instruction pushes the values `A` and `pc` (program counter) onto the stack and then performs a forward jump relative to the value of `pc`. The forward offset is equal to `16 * (imm8[6:0] + 1)`. Maximum jump distance is therefore 128 instructions forward (this means that at least 4 correctly spaced CALL instructions are needed to form a loop in the program). | ||||
| 
 | ||||
| ##### RET | ||||
| The RET instruction is taken only if the stack is not empty. Taken RET instruction pops the return address `raddr` from the stack (it's the instruction following the previous CALL), then pops a return value `retval` from the stack and sets `C = A XOR retval`. Finally, the instruction jumps back to `raddr`. | ||||
| 
 | ||||
| ## Reference implementation | ||||
| A portable C++ implementation of all ALU and FPU instructions is available in [instructionsPortable.cpp](../src/instructionsPortable.cpp). | ||||
| ## C.mask32 | ||||
| `C.mask32` is a 32-bit address mask that is used to calculate the write address for the C operand. `C.mask32` is equal to `imm32`. | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue