Alignment of large binary arrays

60 views
Skip to first unread message

Greg Allen

unread,
Apr 28, 2015, 10:56:19 AM4/28/15
to msg...@googlegroups.com
I often deal with large arrays of binary data for scientific applications. For example:

myMultiChannelData:
  numChannels: 1024
  numSamples: 4096
  sampleRate: 48000
  samples: float[numChannels][numSamples]

I want to avoid copying the samples because they're large and that would be expensive. I'd like to get a pointer to the samples and process them directly where they are. It's not hard to imagine a deserializer where you could do that.

That also means I'd like the samples to be aligned. Not just float-aligned, but maybe even vector-aligned (AXV-512 is a thing now). Perhaps I want to use FFTW, MKL, BLAS/LAPACK, ATLAS, or any other high-performance library with this data. Here is an old article on the importance of alignment [http://www.ibm.com/developerworks/library/pa-dalign/]. It's not as bad as it used to be, but it's still very important for high performance.

I haven't seen anything in the msgpack format that could be used as padding for alignment. A possibility is a no-op format byte similar to that in UBJSON [http://ubjson.org/type-reference/value-types/#noop]. The 0xc1 format code seems like a prime candidate.

Another possibility is a dummy alignment-padding field right before the samples field. That's really ugly, but could be made to work.

The general idea could be, for example (sorry, I'm not real familiar with the mpack API):

// to serialize
void* myPageAlignedPtr = mmap(some_file_for_writing, len);
inplace_packer packer(myPageAlignedPtr,len);
// pack the other fields in to the packer
// and then pack the samples with alignment
unsigned arrayLen = numChannels*numSamples*sizeof(float);
unsigned alignment = 16*sizeof(float); // 256-bit alignment
void* samples = packer.pack_aligned_binary_array(arrayLen, alignment);
// this issues up to (alignment-1) no-ops such that
// (samples%alignment == myPageAlignedPtr%alignment)

// now we can use samples with any high-performance library, e.g.
fft(source, samples, numChannels*numSamples);

Then we could send this through any middleware that preserves alignement. On the deserialize side we could get a pointer to samples, and similarly have it be aligned.

Thoughts or suggestions?

Thanks,
-Greg

Greg Allen

unread,
May 5, 2015, 7:07:10 PM5/5/15
to msg...@googlegroups.com
Updated. I often deal with large arrays of binary data for scientific applications. For example:

class DataSamples {
    unsigned numChannels;
    unsigned numSamples;
    float sampleRate;
//    float* samples; // float [numChannels][numSamples]; use raw_ref.ptr
    msgpack::type::raw_ref ref;
...
    MSGPACK_DEFINE(numChannels,numSamples,sampleRate,ref);
};

I want to avoid copying the samples because they're large and that would be expensive. MsgPack has the ability to do zero-copy deserialization, which is great!

It's this easy, and completely zero-copy:

// tell deserializer to use a reference
bool my_reference_func(msgpack::type::object_type, std::size_t, void*) {
    return 1;
}

{
    // deserialize it from mapped file, with reference func for zero-copy
    MappedReadFile ifile("test.mpack");
    msgpack::unpacked msg;
    msgpack::unpack(msg, (const char*)ifile.get_ptr(), ifile.get_len(), \
    my_reference_func);

    DataSamples d2;
    msg.get() >> d2;
}

However, the serializer can't align the output to make the zero-copy deserializer useful. For example, when I serialize DataSamples (above) I get:

$ hexdump test.mpack
0000000 94 cd 04 00 cd 10 00 ca 47 3b 80 00 c6 01 00 00
0000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
1000010 00                                            

The buffer of floats that I wrote is no longer float-aligned because the stuff preceding the samples is 17 bytes. If only the serializer could be told to float-align that huge array of floats....

I've made a small patch to my copy of MsgPack that lets me specify alignment on serialization. For many payloads this isn't useful, but for scientific data sets, it could make a huge performance impact.

With my small patch, I can request aligned serialization for a binary buffer, thus enabling zero-copy for a consuming deserializer.

{
    DataSamples dataSamples(1024,4096);
    size_t samplesSize = dataSamples.numChannels*dataSamples.numSamples*sizeof(float);
//    packer.pack_bin(samplesSize); // not aligned
    packer.pack_bin_aligned(samplesSize,sizeof(float));
    packer.pack_bin_body(dataSamples.samples,samplesSize);
}

All this does is push out a few noop bytes so that the binary body is aligned according to the request. Unfortunately, that requires a noop format byte (0xc1, currently unused), which is a small addition to the format.

This small change could provide a huge performance boost for scientific data sets (large, uniform matrices of data).

Thoughts? Is there anybody out there?

Thanks,
-Greg

Xavier Snelgrove

unread,
Jul 8, 2015, 12:55:07 PM7/8/15
to msg...@googlegroups.com
I realize this is a couple months later, but FWIW I'm having this same issue and could also really use an "alignment" concept in msgpack. I'm memory mapping a huge file that contains some large float vectors, and would like to be able to perform math on those vectors without copying them first. I would support this no-op concept.

  Xavier
Reply all
Reply to author
Forward
0 new messages