* this better matches CPU capabilities since execution ports are usually split 1:1 between fadd and fmul
* the frequency of FSWAP_R decreased from 8 to 4 (it's ASIC-friendly)
* activate IROL_R instruction
* added detailed guidelines for the selection of configuration values
* added additional compile-time checks to prevent bad configurations
* removed RANDOMX_SUPERSCALAR_MAX_SIZE parameter