* Newer Intel and AMD CPUs have SSE extensions for SHA-1 and SHA-256 acceleration.
* Add new cpu.c/cpu.h sources to detect the extensions, and use them in checksum.c
if available.
* Acceleration code is taken from https://github.com/noloader/SHA-Intrinsics.
* Some "unofficial" Windows ISOs use a custom boot.wim that only includes the Setup
image at index 1, rather than at index 2, after the PE image, for official ISOs.
* Also refactor to add a long needed vhd.h header.
* Also fix a MinGW warning.
Yes!!! We are finally *much* faster than 7-zip for SHA-256, even though
we are also computing MD5 and SHA-1 in parallel. Here are some averaged
comparative results, against the 5.71 GB Win10_20H2_EnglishInternational_x64.iso
(SHA-256 = 08535b6dd0a4311f562e301c3c344b4aefd2e69a82168426b9971d6f8cab35e1):
* Windows' PowerShell Get-FileHash: 48s
* 7-zip's SHA-256 : 31s
* Rufus (64-bit release version) : 23s