Re: [m4ri] valgrind errors on printing output matrix of mzd_transpose

48 views
Skip to first unread message

Martin Albrecht

unread,
Apr 12, 2013, 10:35:26 AM4/12/13
to Grégory Landais, M4RI Development
Hi Grégory (CC: m4ri-devel),

On Friday 12 Apr 2013, Grégory Landais wrote:
> Dear Martin,
>
> I've been using m4ri as a library for my C program for some time and I'm
> facing some problems. I tried to hunt the causes and for the moment I'm
> stuck with this one :
>
> When printing a transposed random matrix, I get the following errors
> from valgrind :
>
> ~/tmp/test_m4ri $ valgrind --track-origins=yes ./main4 > /dev/null
> ==8605== Memcheck, a memory error detector
> ==8605== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
> ==8605== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
> copyright info ==8605== Command: ./main4
> ==8605==
> ==8605== Use of uninitialised value of size 8
> ==8605== at 0x3400244E4B: _itoa_word (_itoa.c:195)
> ==8605== by 0x3400247A87: vfprintf (vfprintf.c:1616)
> ==8605== by 0x3400250659: printf (printf.c:35)
> ==8605== by 0x4009A0: main (main4.c:26)
> ==8605== Uninitialised value was created by a stack allocation
> ==8605== at 0x403D62: _mzd_copy_transpose_lt64x64 (mzd.c:580)
> ==8605==
> ==8605== Conditional jump or move depends on uninitialised value(s)
> ==8605== at 0x3400244E55: _itoa_word (_itoa.c:195)
> ==8605== by 0x3400247A87: vfprintf (vfprintf.c:1616)
> ==8605== by 0x3400250659: printf (printf.c:35)
> ==8605== by 0x4009A0: main (main4.c:26)
> ==8605== Uninitialised value was created by a stack allocation
> ==8605== at 0x403D62: _mzd_copy_transpose_lt64x64 (mzd.c:580)
> ==8605==
> ==8605==
> ==8605== HEAP SUMMARY:
> ==8605== in use at exit: 0 bytes in 0 blocks
> ==8605== total heap usage: 55 allocs, 55 frees, 1,060,184 bytes allocated
> ==8605==
> ==8605== All heap blocks were freed -- no leaks are possible
> ==8605==
> ==8605== For counts of detected and suppressed errors, rerun with: -v
> ==8605== ERROR SUMMARY: 640 errors from 2 contexts (suppressed: 4 from 4)
>
> Here is a minimal code that produces the errors :
>
> #include <stdio.h>
> #include <stdlib.h>
> #include "m4ri.h"
>
> int main()
> {
> int r = 144;
> int l = 10;
> int i, j;
> mzd_t* A = mzd_init(r, r-l);
> mzd_t* AT = mzd_init(r-l, r);
> mzd_randomize(AT);
>
> mzd_transpose(A, AT);
>
> BIT a;
> for (i = 0; i < AT->nrows; ++i) {
> for (j = 0; j < AT->ncols; ++j) {
> a = mzd_read_bit(AT, i, j);
> printf("%d", a);
> }
> }
> for (i = 0; i < A->nrows; ++i) {
> for (j = 0; j < A->ncols; ++j) {
> a = mzd_read_bit(A, i, j);
> printf("%d", a);
> }
> }
>
> mzd_free(A);
> mzd_free(AT);
> return 0;
> }
>
>
> I link with changeset 8f08bc8469ed and I build m4ri this way :
>
> autoreconf --install
> ./configure
> CFLAGS="-g -O0" make
>
> and my code this way :
>
> gcc -o main4 -O0 -Wall -Wextra -std=gnu99 -g main4.c
> /home/ROCQ/secret/landais/Travaux/dev/m4ri/.libs/libm4ri.a
> -I /home/ROCQ/secret/landais/Travaux/dev/m4ri/src/ -I
> /home/ROCQ/secret/landais/Travaux/dev/m4ri/ -lm -lpng
>
> So this do not seem critical (since the transposition is correct) but it
> may hide something deeper.

I can confirm this behaviour and kinda identified where things go wrong. The
array t in _mzd_copy_transpose_lt64x64 is not initialised completely if n<32
because we don't need to (but apparently the code accesses uninitialised parts
anyway). We'll investigate, thanks for reporting this.

> Second point :
>
> In my program, I am trying to do "partial" full echelonization i.e. I
> want to fully echelonize my matrices but stop when a fixed number of
> columns is proccessed. I am currently patching brilliantrussian.c to
> achieve that but this is not a sustainable solution. Do you think this
> feature could be included in some way in m4ri or do you have an other
> idea to achieve what I want ?

Patching brilliantrussian.c would on give you n^3/log(n), not fully
asymptotically fast, so it's not optimal (but maybe okay for your
dimensions?). What do you mean by "stop when a fixed number of columns is
processed", do you mean you only want to deal with the first, say, C=1024
columns? In that case, I'd do something like this (which is pieced together
from PLE decomposition and RREF computaiton):

- setup a matrix window of width C, i.e.

A = [A0|A1]

- run PLUQ decomposition on

A0 -> [L\E]

- apply the transformation matrices L and P to pivot rows in A1

-> A1 = P * L^-1 * A1

- apply to transformation to non-pivot rows in A1

- do a left TRSM on E and A1 (and rest of A0) to reduce non-pivots to reduced
RREF

- apply Q to undo permutation shifts

So it's essentially like the "if (r1)" block in ple.c in line 131 + the
"if(full)" block in line 43 of echelonform.c. Let me know if that's too vague,
though.

Cheers,
Martin

--
name: Martin Albrecht
_pgp: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www: http://martinralbrecht.wordpress.com/
_jab: martinr...@jabber.ccc.de

Martin Albrecht

unread,
Apr 12, 2013, 11:09:28 AM4/12/13
to m4ri-...@googlegroups.com, Grégory Landais, martinr...@googlemail.com
I have opened a new ticket for the issue reported:

Carlo Wood

unread,
Apr 12, 2013, 1:35:14 PM4/12/13
to m4ri-...@googlegroups.com, martinr...@googlemail.com, Grégory Landais
I'm convinced that valgrind is wrong; see my private email to you (that
I sent before I saw this, or I would have send it to the list).

Nevertheless, let me repeat what I said:

If you initialize the stack array 'word t[64]' in
_mzd_copy_transpose_lt64x64 with all zeroes or all ones, then the
valgrind error goes away - but the output/result remains the same.

What happens imho is that bits of this array that are outside the
matrix (remember that every matrix is stored in an integer number of
words, and thus stored in a multiple of 64 bits), are shifted into
another word by __M4RI_GET_BIT, confusing valgrind to think that word
depend on uninitialized data, while in fact only the least significant
bit is relevant since we bit-wise AND it with 1.

On Fri, 12 Apr 2013 08:09:28 -0700 (PDT)
Martin Albrecht <martinr...@googlemail.com> wrote:

> I have opened a new ticket for the issue reported:
>
> https://bitbucket.org/malb/m4ri/issue/53/use-of-uninitialised-values-in

--
Carlo Wood <ca...@alinoe.com>

Martin Albrecht

unread,
Apr 16, 2013, 11:23:39 AM4/16/13
to Grégory Landais, M4RI Development
Hey,

yep, that looks like bug.

However, I am inclined to flag this "won't fix" and to add an assert(A->offset
== 0) to the beginning of mzd_to_png.

Fixing this would increase code complexity (and possibly a bit of performance
loss).

I am starting to think that allowing A->offset != 0 was not a good design
decision (M4RIE, for example, does fine without allow this). Hence, I am
thinking about actually *removing* offset != 0 which would make the code base
smaller.

The only place where offset != 0 is needed is in TRSM upper left when we
compute the RREF from PLUQ. But this case can easily be dealt with by some
temporary buffer (this is done in M4RIE).

What do others think?

On Tuesday 16 Apr 2013, you wrote:
> Maybe I found another one :
>
> I attached a C file and an input png file. I think the function
> mzd_to_png doesn't deal correctly with windows. As you will see in the
> output png file, the 6 first columns are zeroed while they are not if
> you look at stdout.
>
> I believe it is a misalignement of words since 134 (width of the
> diagonal) mod 64 = 6

Clement Pernet

unread,
Apr 17, 2013, 3:20:40 AM4/17/13
to m4ri-...@googlegroups.com, Martin Albrecht, Grégory Landais
Hi there,

I am probably partly responsible for this offset design. I agree with Martin that
- it makes the code way too complicated and probably slower
- it could be avoided by using column block splitting aligned on 64 bits, which is in most cases
possible (except the one with RREF from PLUQ, and possibly one or two other places but temporary
buffers are maybe a good alternative).
Ok to get rid of them!
Cheers
Cl�ment


Le 16/04/2013 17:23, Martin Albrecht a �crit :
Reply all
Reply to author
Forward
0 new messages