I did a test locally using protobuf-net (since that is what I'm most familiar with); to get an output of about 3,784,000 I used a count of 172000 items in the inner array - does that sound about right? Then I tested it in a loop as per this gist:
https://gist.github.com/mgravell/10a21970531485008731d700b89ec732
The timings I get there are about 25ms to serialize, 40ms to deserialize - although my machine is quite fast, so this may mean you're pretty much getting "about right" numbers. What sort of numbers are you looking for here? Note that in .NET the first run will always be slightly slower due to JIT.
Additionally, I know nothing about the zmq overheads - are you including the zmq cost in your numbers?
If you want to squeeze the last few drops of performance, you usually can - for example, by looking at whether zmq allows you to pass a stream or span or an *oversized* array - again, I'm not as familiar with the Google C# version as I am with protobuf-net, but *if* (and it is a huge "if") the "ToByteArray()" is essentially writing to a MemoryStream then calling ToArray, you can probably avoid an extra blit and some allocs by providing your own re-used memory-stream and using GetBuffer to access the oversized array, remembering to limit to just the first .Length bytes. But again a lot of this depends on specifics of zmq and the Google C# version. It is almost certainly diminishing returns.
So: what numbers are you *looking to get*? what would be "acceptable"? And how complex is your data? is what you've shown the *only* data you need to transfer? if so, it might not be a bad candidate for fully manual explicit serialization not involving a library - just a payload of:
Height [int32 fixed 4 bytes]
Width [int32 fixed 4 bytes]
Time [int64 fixed 8 bytes]
ElementCount [int32 fixed 4 bytes]
then for each element: 16 bytes consisting of
X [float fixed 4 bytes]
Y [float fixed 4 bytes]
Z [float fixed 4 bytes]
RGB [int32 fixed 4 bytes]
this would require manual coding, but would typically outperform other options - but would be more brittle and would require you to be reasonably good at IO code.
Personally, I'd probably leave it alone...
Marc