Fixing valgrid errors by setting INIT_HTAB=1

19 views
Skip to first unread message

Arnaldo Leon

unread,
May 17, 2020, 7:27:31 PM5/17/20
to h5py
Hi all,

It turns out I managed to fix a very annoying Valgrind error by setting  INIT_HTAB=1 in lzfP.h:91. It seems that the author of the LZF library wanted to squeeze a little more performance out of the compression speed, but he did so at the cost of reading un-initialized memory. By setting this flag, the htab variable is set to zero before being used. Otherwise you'd get tons of "Conditional jump or move depends on uninitialised value(s)".

Another side benefit is that now the compressed results are identical between runs. Before it was the case that h5diff would report differences, but they were only due to differences in the compressed output, not in the original source data.

This is a one-line change in lzfP.h.. Should I just create a pull request for this?

Kind Regards

-aml

Thomas Caswell

unread,
Jun 26, 2020, 7:35:14 PM6/26/20
to h5py
I have some concerns, both about diverging from the upstream library and about the exact change here.

If the output data is being changed by zeroing parts of memory is it because there is some entropy being pulled from those bits that is doing useful work or is it noise that is in a region of memory that is there for padding?

Tom 

--
You received this message because you are subscribed to the Google Groups "h5py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h5py+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h5py/e3db2972-af0a-439e-837b-c58adb3ce0e8%40googlegroups.com.


--
Thomas Caswell
tcas...@gmail.com

Arnaldo Leon

unread,
Jun 27, 2020, 3:16:07 PM6/27/20
to h5py
Hi Tom,

I believe the author of LZF was trying to squeeze some performance by not having to initialize memory. Considering that performance was paramount for this library is seem like a sensible trade-off. There is an post about it here:


The main motivation behind cleaning this up would be to reduce the number of valgrind errors and determinism of the output. The reason the latter is important is that tools like h5diff seems to report differences with LZF compressed files even though the data is the same, which is a bit annoying.

-aml
Reply all
Reply to author
Forward
0 new messages