Read raw file (big or little endian)

Jimmy Johnson

unread,

May 11, 2023, 4:03:51 PM5/11/23

to Eiffel Users

For a deep learning library, I need to read a raw file in "idx" format as shown at bottom of page at http://yann.lecun.com/exdb/mnist/ . It needs to be platform independent.

According to this format, the first integer is a magic number in which the first two bytes should always be zero. I suppose I can read this integer, and if the first two bytes are not zero, then [for this platform] the bytes are coming in in the wrong order. Calling a feature for RAWFILE that reads more than one byte (e.g. RAW_FILE.read_integer or RAWFILE.read_natural_32) would give the wrong value. In this case I would have to switch the bytes around. Okay, I can do that.

The problem is...how do I read in bytes and then convert to a REAL_32 or REAL_64? The NATURAL_xx and INTEGER_xx classes have bit-shift operations, but these operations are not available for the REAL_xx types.

Anyone have a canned solution for this? Is there a byte-swapping class in Eiffel?

Thanks,

jjj

r...@amalasoft.com

unread,

May 11, 2023, 5:02:55 PM5/11/23

to eiffel...@googlegroups.com

For raw (and other reads), I typically use read_stream, and give the file.count value as the length of the stream (i.e. the whole file)

R

--
You received this message because you are subscribed to the Google Groups "Eiffel Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eiffel-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eiffel-users/79b6efe1-3c57-4be4-a303-07a12150dcfan%40googlegroups.com.

Jimmy Johnson

unread,

May 11, 2023, 5:24:40 PM5/11/23

to Eiffel Users

Okay, how do you convert the values to reals given the sequence of [byte? / CHARACTER_8? / CHARACTER_32?] values.

Louis M

unread,

May 11, 2023, 7:57:32 PM5/11/23

to eiffel...@googlegroups.com

Hello,

Just to let you know that I have already created an MNIST data viewer. You can find it there: https://gitlab.com/tioui/eiffel-mnist-viewer .

You need the Eiffel Game2 Library to use the Viewer, but know that the MNIST data reader is independent of the Game library: https://gitlab.com/tioui/eiffel-mnist-viewer/-/blob/master/mnist_images.e

Also, the method I used to convert the endianness of the data is shown here: https://gitlab.com/tioui/eiffel-mnist-viewer/-/blob/master/integer_reader.e

Good day,

Louis M

To view this discussion on the web visit https://groups.google.com/d/msgid/eiffel-users/cdfc0266-ac31-40ca-bc74-f2d7555a7c03n%40googlegroups.com.

r...@amalasoft.com

unread,

May 11, 2023, 9:39:39 PM5/11/23

to eiffel...@googlegroups.com

This might help. Once you have the endianess under control, you should be able to convert to an integer easily enough (multiply and shift, for example)

ael_sprt_endian.e

Jimmy Johnson

unread,

May 12, 2023, 1:14:25 AM5/12/23

to Eiffel Users

Wow! I think this is just what I was looking for. Thanks Louis and rfo (Roger?).

I have not tried it yet, but, I assume the feature `as_integer_32' in the line

"last_integer_32 := l_natural.as_integer_32"

handles any sign bit?

Thanks again,

jjj

BTW, Louis, may I ask the larger purpose the MNIST reader?

r...@amalasoft.com

unread,

May 12, 2023, 9:39:03 AM5/12/23

to eiffel...@googlegroups.com

I whipped this up this morning. Might help.

R (yes, Roger)

bytes_to_float (bytes: ARRAY [NATURAL_8]; lef: BOOLEAN): REAL_32

-- Convert Byte sequence into 32 Bit floating point value

-- If 'lef' then assume bytes are Little Endian, else assume

-- they are Big Endian

require

valid_count: bytes.count = 4

local

ilo: INTEGER

ival, v1, v2, v3, v4: NATURAL_32

do

ilo := bytes.lower

v1 := bytes.item (ilo).as_natural_32

v2 := (bytes.item (ilo + 1)).as_natural_32

v3 := (bytes.item (ilo + 2)).as_natural_32

v4 := (bytes.item (ilo + 3)).as_natural_32

if lef then

ival := ival + v4 |<< 24

ival := ival + v3 |<< 16

ival := ival + v2 |<< 8

ival := ival + v1

else

ival := ival + v1 |<< 24

ival := ival + v2 |<< 16

ival := ival + v3 |<< 8

ival := ival + v4

end

Result := c_cast_nat32_to_float ($ival)

end

--|========================================================================

feature {NONE} -- Externals

--|========================================================================

c_cast_nat32_to_float (v: POINTER): REAL_32

external

"C inline"

alias

"{

return (*((float *) ($v)));

}"

end

To view this discussion on the web visit https://groups.google.com/d/msgid/eiffel-users/ba8657bf-8b82-473e-b7fa-651818b68865n%40googlegroups.com.

Louis M

unread,

May 12, 2023, 10:03:58 AM5/12/23

to eiffel...@googlegroups.com

As I understand it, `as_integer_32' does not handle sign bit. Here, I only suppose that values are always below 2147483647. If it is, the sign bit should not be a problem. Considering the content of the file, it should always be correct.

As of the reader, it is to visualize information in the MNIST files. It look like this:

I can scroll over the images and the labels to see what information was in the files. I used this program in an AI presentation that I did and to help one of my student that was programming the backpropagation algorithm.

Good day,

Louis M

To view this discussion on the web visit https://groups.google.com/d/msgid/eiffel-users/ba8657bf-8b82-473e-b7fa-651818b68865n%40googlegroups.com.

Reply all

Reply to author

Forward