Read raw file (big or little endian)

167 views
Skip to first unread message

Jimmy Johnson

unread,
May 11, 2023, 4:03:51 PM5/11/23
to Eiffel Users
For a deep learning library, I need to read a raw file in "idx" format as shown at bottom of page at http://yann.lecun.com/exdb/mnist/ .  It needs to be platform independent. 

According to this format, the first integer is a magic number in which the first two bytes should always be zero.  I suppose I can read this integer, and if the first two bytes are not zero, then [for this platform] the bytes are coming in in the wrong order.  Calling a feature for RAWFILE that reads more than one byte (e.g. RAW_FILE.read_integer or RAWFILE.read_natural_32) would give the wrong value.  In this case I would have to switch the bytes around.  Okay, I can do that.

The problem is...how do I read in bytes and then convert to a REAL_32 or REAL_64?  The NATURAL_xx and INTEGER_xx classes have bit-shift operations, but these operations are not available for the REAL_xx types.

Anyone have a canned solution for this?  Is there a byte-swapping class in Eiffel?

Thanks,
jjj

r...@amalasoft.com

unread,
May 11, 2023, 5:02:55 PM5/11/23
to eiffel...@googlegroups.com
For raw (and other reads), I typically use read_stream, and give the file.count value as the length of the stream (i.e. the whole file)
R
--
You received this message because you are subscribed to the Google Groups "Eiffel Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to eiffel-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/eiffel-users/79b6efe1-3c57-4be4-a303-07a12150dcfan%40googlegroups.com.

Jimmy Johnson

unread,
May 11, 2023, 5:24:40 PM5/11/23
to Eiffel Users
Okay, how do you convert the values to reals given the sequence of [byte? / CHARACTER_8? / CHARACTER_32?] values.

Louis M

unread,
May 11, 2023, 7:57:32 PM5/11/23
to eiffel...@googlegroups.com

Hello,

Just to let you know that I have already created an MNIST data viewer. You can find it there: https://gitlab.com/tioui/eiffel-mnist-viewer .

You need the Eiffel Game2 Library to use the Viewer, but know that the MNIST data reader is independent of the Game library: https://gitlab.com/tioui/eiffel-mnist-viewer/-/blob/master/mnist_images.e

Also, the method I used to convert the endianness of the data is shown here: https://gitlab.com/tioui/eiffel-mnist-viewer/-/blob/master/integer_reader.e

Good day,

Louis M

r...@amalasoft.com

unread,
May 11, 2023, 9:39:39 PM5/11/23
to eiffel...@googlegroups.com
This might help.  Once you have the endianess under control, you should be able to convert to an integer easily enough (multiply and shift, for example)
ael_sprt_endian.e

Jimmy Johnson

unread,
May 12, 2023, 1:14:25 AM5/12/23
to Eiffel Users
Wow!  I think this is just what I was looking for.  Thanks Louis and rfo (Roger?).

I have not tried it yet, but, I assume the feature `as_integer_32' in the line
     "last_integer_32 := l_natural.as_integer_32"
handles any sign bit?

Thanks again,
jjj

BTW, Louis, may I ask the larger purpose the MNIST reader?

r...@amalasoft.com

unread,
May 12, 2023, 9:39:03 AM5/12/23
to eiffel...@googlegroups.com
I whipped this up this morning. Might help.
R (yes, Roger)

   bytes_to_float (bytes: ARRAY [NATURAL_8]; lef: BOOLEAN): REAL_32
         -- Convert Byte sequence into 32 Bit floating point value
         -- If 'lef' then assume bytes are Little Endian, else assume 
         -- they are Big Endian
      require
         valid_count: bytes.count = 4
      local
         ilo: INTEGER
         ival, v1, v2, v3, v4: NATURAL_32
      do
         ilo := bytes.lower

         v1 := bytes.item (ilo).as_natural_32
         v2 := (bytes.item (ilo + 1)).as_natural_32
         v3 := (bytes.item (ilo + 2)).as_natural_32
         v4 := (bytes.item (ilo + 3)).as_natural_32

         if lef then
            ival := ival + v4 |<< 24
            ival := ival + v3 |<< 16
            ival := ival + v2 |<< 8
            ival := ival + v1
         else
            ival := ival + v1 |<< 24
            ival := ival + v2 |<< 16
            ival := ival + v3 |<< 8
            ival := ival + v4
         end
         Result := c_cast_nat32_to_float ($ival)
      end

--|========================================================================
feature {NONE} -- Externals
--|========================================================================

   c_cast_nat32_to_float (v: POINTER): REAL_32
      external
         "C inline"
      alias
         "{
         return (*((float *) ($v)));
         }"
      end

Louis M

unread,
May 12, 2023, 10:03:58 AM5/12/23
to eiffel...@googlegroups.com

As I understand it, `as_integer_32' does not handle sign bit. Here, I only suppose that values are always below 2147483647. If it is, the sign bit should not be a problem. Considering the content of the file, it should always be correct.

As of the reader, it is to visualize information in the MNIST files. It look like this:

I can scroll over the images and the labels to see what information was in the files. I used this program in an AI presentation that I did and to help one of my student that was programming the backpropagation algorithm.

Good day,

Louis M

Reply all
Reply to author
Forward
0 new messages