reading half precision floating point numbers & floating point bit manipulation

Ryan Gardner

unread,

Apr 23, 2013, 7:55:08 AM4/23/13

to julia...@googlegroups.com

I need to read in a large file full of half-precision (binary16) floating point numbers.

I have two questions:

1) Is there a clean way to do this directly that I'm missing (e.g. "read(fstream, Float16, size)" where Float16 exists)?

2) If not, what's the cleanest way to create floating point numbers from raw bits or convert these values manually? (I.e., I don't want 0x00000010 to be converted to 16.0, but rather the appropriate 32-bit floating point value represented by those bits.)

My current plan, in the worst case, is to write a pretty nasty hack that reads the values in as an array of Uint16s. Then, I'll create an array of Uint32s whose bits correspond to the Float32 equivalents of all my half-precision floats by conducting the appropriate bit manipulations. Lastly, I'll convert the array of Uint32s to a pointer to Uint32s, convert the pointer to a pointer to Float32s, and then convert that back to an array of Float32s. :/

I don't want to create the Floats using any floating point operations, like converting the significand to a Float and then manually multiplying by 2^exp etc., for example, because I don't want to introduce any additional floating point error in the numbers.

Thanks.

Patrick O'Leary

unread,

Apr 23, 2013, 8:56:23 AM4/23/13

to julia...@googlegroups.com

You can at least get away from the pointer hack part with reinterpret():

julia> reinterpret(Float32, [0x01230124; 0x4a23fd21])
2-element Float32 Array:
2.99392f-38
2.68679f6

Simon Byrne

unread,

Apr 23, 2013, 10:09:47 AM4/23/13

to julia...@googlegroups.com

Yes, I would

1) read in as Uint16

2) convert(Uint32, x)

3) use bit manipulations to get into Float32 form

4) reinterpret(Float32, y)

For 3 & 4, here's some code I wrote to convert IBM 64bit floats to IEEE 64bit floats,

https://gist.github.com/simonbyrne/5443843

yours probably won't need to be quite as complicated, as both formats are base 2, and are normalised.

Reply all

Reply to author

Forward