I have tried several ways, and so far, no luck. I have provided below a couple of techniques that I have tried, and I am confident; please let me know what I am missing.
Quick recall, I am trying to retrieve float values from xmm registers. The first lesson I learned is that you cannot access the xmm registers directly as you do for integer registers. Because integers are stored in general-purpose registers, however, floats use Streamed SIMD Extensions (SSE). These Single Instruction Multiple Data (SIMD) contain 4 floating point values, each 4 bytes, so a total of 16 bytes for each xmm register.
With this knowledge, I tried to access xmm using inline asm code as follows. The SSE instructions have a suffix -ss for scalar operations (
Single Scalar) and -ps for packed operations (
Parallel Scalar). The following multiplication works. Here is a good resource I found -
https://www.songho.ca/misc/sse/sse.htmlfloat rfx;
asm("mulss %%xmm0,%%xmm1" : "=r"(rfx)); multiplication of two working xmm registers
However, data copy does not works. I tried several ways, but many of them ended with errors.
asm("movaps [eax],%xmm0"); //-not working with errors
Also tried data alignment as follows:
__attribute__((aligned (16))) float b[4000];
asm("movaps [b], %xmm0");
dr_fprintf(STDERR, "..reading xmm0 register value.......%f...\n", b );
//dr_fprintf(STDERR, "..reading register value.......%f.....%f....%f.....%f...\n", b[0], b[1], b[2], b[3] ); //another variation
Register Offset
%rdi 0
%rsi 8
%rdx 16
%rcx 24
%r8 32
%r9 40
%xmm0 48
%xmm1 64
. . .
%xmm15 288
Hence, I tried to offset the xmm address from register r9 as below. But could not get it working.
drwrap_context_t *wrapcxt_t = (drwrap_context_t *)wrapcxt;
__attribute__((aligned (4))) float xmmf1[1000];
float xmmf1a, xmmf1b, xmmf1c, xmmf1d;
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f4....: %p\n", &wrapcxt_t->mc->r9 + 0x08);
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f4....: %f\n", *(float *)(&wrapcxt_t->mc->r9 + 0x08));
memcpy(&xmmf1, (&wrapcxt_t->mc->r9 + 0x08), sizeof(float));
dr_fprintf(STDERR, ".....xmmf1=%f\n",xmmf1);
memcpy(&xmmf1a, (&wrapcxt_t->mc->r9 + 0x08), sizeof(float));
dr_fprintf(STDERR, ".....xmmf1a=%f\n",xmmf1a);
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f8....: %p\n", &wrapcxt_t->mc->r9 + 0x0C);
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f8....: %f\n", *(float *) (&wrapcxt_t->mc->r9 + 0x0C));
memcpy(&xmmf1b, (&wrapcxt_t->mc->r9 + 0x0C), sizeof(float));
dr_fprintf(STDERR, ".....xmmf1b=%f\n",xmmf1b);
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f12....: %p\n", &wrapcxt_t->mc->r9 + 0x10);
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f12....: %f\n", *(float *)(&wrapcxt_t->mc->r9 + 0x10));
memcpy(&xmmf1c, (&wrapcxt_t->mc->r9 + 0x10), sizeof(float));
dr_fprintf(STDERR, ".....xmmf1c=%f\n",xmmf1c);
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f16....: %p\n", &wrapcxt_t->mc->r9 + 0x14);
dr_fprintf(STDERR, ".....wrapcxt context xmm0 f16....: %f\n", *(float *)(&wrapcxt_t->mc->r9 + 0x14));
memcpy(&xmmf1d, (&wrapcxt_t->mc->r9 + 0x14), sizeof(float));
dr_fprintf(STDERR, ".....xmmf1d=%f\n",xmmf1d);
Any help would be appreciated.