I see a number of "issues", many with helpful patches, but no updates. In addition my own testing reveals that
return ((cpu_info[3] & (1 << 20)) != 0);
should be:
return ((cpu_info[2] & (1 << 20)) != 0);
from what I can tell. (Comparing with the Intel provided:
void get_cpuid_info( CPUIDinfo *Info, const unsigned int leaf, const unsigned int subleaf )
{
// Stores CPUID return Info in the CPUIDinfo structure.
// leaf and subleaf used as parameters to the CPUID instruction
// parameters and registure usage designed to be safe for both Win and Linux
// when using -use-msasm
__asm
{
mov edx, Info; addr of start of output array
mov eax, leaf; leaf
mov ecx, subleaf; subleaf
push edi
push ebx
mov edi, edx; edi has output addr
cpuid
mov DWORD PTR[edi], eax
mov DWORD PTR[edi + 4], ebx
mov DWORD PTR[edi + 8], ecx
mov DWORD PTR[edi + 12], edx
pop ebx
pop edi
}
return;
}
#define SSE4_1_FLAG 0x080000
#define SSE4_2_FLAG 0x100000
...
// The code first determines if the processor is an Intel Processor. If it is, then
// feature flags bit 19 (SSE 4.1) and 20 (SSE 4.2) in ECX after CPUID call with EAX = 0x1
// are checked.
// If both bits are 1 (indicating both SSE 4.1 and SSE 4.2 exist) then
// the function returns 1
const int CHECKBITS = SSE4_1_FLAG | SSE4_2_FLAG;
// execute CPUID with eax (leaf) = 1 to get feature bits,
// subleaf doesn't matter so set it to zero
get_cpuid_info( &Info, 0x1, 0x0 );
if ( ( Info.ECX & CHECKBITS ) == CHECKBITS )
{
rVal = 1;
}
... )
I'm looking for a fast CRC32 implementation, but this one looks to suffer from being a) Too complex to understand if not an expert, and b) Bugs.
Unfortunate really.