canonical binary representation for Array

189 views
Skip to first unread message

Michele Zaffalon

unread,
Oct 13, 2016, 2:24:27 AM10/13/16
to julia-users
I need to write a 4 dimensional a array to file and use

write(f, a).

What is the canonical binary representation of a? It looks like the the line above is equivalent to

write(f, reshape(a, prod(size(a))))

Is the canonical binary representation going to be machine and OS independent (except for the endianness)? What about reshape?

I am porting code from MATLAB and the specs for the file format are defined by the MATLAB code implementation.

michele


FANG Colin

unread,
Oct 13, 2016, 8:54:22 AM10/13/16
to julia-users
Are you trying to serialise Julia objects? Why don't you try json or msgpack or so as your encoding?

The default serialize could also work but you need to be careful it doesn't guarantee version safe.

Páll Haraldsson

unread,
Oct 13, 2016, 9:00:28 AM10/13/16
to julia-users

RawArray.jl (or alternatives..) may be what you need, at least helpful/informative discussion (and looking at code, maybe telling you what you need to know):

https://groups.google.com/forum/#!searchin/julia-users/rawarray%7Csort:relevance/julia-users/ulkiPhGcv-0/TqyX8g9LBwAJ


On Thursday, October 13, 2016 at 6:24:27 AM UTC, Michele Zaffalon wrote:
I need to write a 4 dimensional a array to file and use

write(f, a).

For sure need that?
 
What is the canonical binary representation of a? It looks like the the line above is equivalent to
 
Is the canonical binary representation going to be machine and OS independent

Why shouldn't it be?
 
(except for the endianness)? What about reshape?
I am porting code from MATLAB and the specs for the file format are defined by the MATLAB code implementation.

I'm not sure I'm answering you question, but I'm pretty sure, DenseArray s as packed in memory as possible (not sure if your values would be three-byte, if padded to 4-byte):

julia> write(STDOUT, [1 2])
16

julia> write(STDOUT, [0x1 0x2])
2

I'm pretty sure multidimensional changes nothing, you get Julia order (no gaps), ColumnMajor order wasn't it?

Complex numbers are a problem for some reason I didn't look into (see thread), that RawArray handles, but HDF5 (strangel) doesn't.

Pointers in arrays would be dangerous..


In the docs, yo point to:

You can write multiple values with the same write call. i.e. the following are equivalent:


write
(stream, x, y...) write(stream, x) + write(stream, y...)

Is this for sure true? Not:
write(stream, x); write(stream, y...)


I like looking up directly what Julia does, e.g.:
@edit prod([]) # I do first: ENV["EDITOR"] = "vi"

in your case seems to be:
prod(f::Callable, a) = mapreduce(f, *, a)

 * Why do I find SparseArrays (only plural), that I think you do not care about, but DenseArray singular as expected?

help?> Sparse

Michele Zaffalon

unread,
Oct 13, 2016, 9:30:55 AM10/13/16
to julia...@googlegroups.com
I guess I was not clear enough: I cannot change the file format, so a Julia-specific solution is not going to work.

Steven G. Johnson

unread,
Oct 13, 2016, 9:38:36 AM10/13/16
to julia-users
write on a numeric array will output the raw bytes, i.e. Column-major data in the native byte order.

Matlab arrays are also column major, so reading a Matlab-produced binary format is probably straightforward, but you have to be careful of the byte order. Obviously, you'll have to read the documentation on your file format carefully. Whatever the format is, however, Julia offers enough low-level control to read and write it efficiently.

Michele Zaffalon

unread,
Oct 13, 2016, 11:15:14 AM10/13/16
to julia...@googlegroups.com
I wish there was documentation for the file format aside from the MATLAB implementation which also just uses `reshape`.

The only sure thing about the format is that the first byte of the file is the endianness.


Michele Zaffalon

unread,
Oct 19, 2016, 3:03:49 AM10/19/16
to julia-users
On Thursday, October 13, 2016 at 3:38:36 PM UTC+2, Steven G. Johnson wrote:
write on a numeric array will output the raw bytes, i.e. Column-major data in the native byte order.


Would it be a reasonable assumption that reshaping will not change the ordering in future Julia implementations?

Páll Haraldsson

unread,
Oct 21, 2016, 8:04:28 PM10/21/16
to julia-users

[PyCall allows already changing to row-major, but there's a penalty then.]


From memory, I recall some talk of row-major could be choosable (in theory), can now only find:

https://groups.google.com/forum/#!topic/julia-dev/Q-LFNapBdb0
"I like the way of Numpy of choose at creation time if order in memory has to be "fortran-style" (column-major) or "c-style" (row-major). If you now that operations are going to be intensive over columns, you can choice create a fortran-style matrix (and this avoid the time of create a c-style and do and transposition of the matrix)."

Tobias Knopp

unread,
Oct 23, 2016, 1:35:39 PM10/23/16
to julia-users
Dear Michele,

yes the assumption is absolutely valid. Julia arrays are stored column-major ordered in memory and this will not change. Since write does only dump the memory on disk this is also not going to change.

Cheers,

Tobi

Michele Zaffalon

unread,
Oct 24, 2016, 1:28:22 AM10/24/16
to julia...@googlegroups.com
Thank you, Tobi.
michele
Reply all
Reply to author
Forward
0 new messages