Hi Grégory (CC: m4ri-devel),
On Friday 12 Apr 2013, Grégory Landais wrote:
> Dear Martin,
>
> I've been using m4ri as a library for my C program for some time and I'm
> facing some problems. I tried to hunt the causes and for the moment I'm
> stuck with this one :
>
> When printing a transposed random matrix, I get the following errors
> from valgrind :
>
> ~/tmp/test_m4ri $ valgrind --track-origins=yes ./main4 > /dev/null
> ==8605== Memcheck, a memory error detector
> ==8605== Copyright (C) 2002-2009, and GNU GPL'd, by Julian Seward et al.
> ==8605== Using Valgrind-3.6.0.SVN-Debian and LibVEX; rerun with -h for
> copyright info ==8605== Command: ./main4
> ==8605==
> ==8605== Use of uninitialised value of size 8
> ==8605== at 0x3400244E4B: _itoa_word (_itoa.c:195)
> ==8605== by 0x3400247A87: vfprintf (vfprintf.c:1616)
> ==8605== by 0x3400250659: printf (printf.c:35)
> ==8605== by 0x4009A0: main (main4.c:26)
> ==8605== Uninitialised value was created by a stack allocation
> ==8605== at 0x403D62: _mzd_copy_transpose_lt64x64 (mzd.c:580)
> ==8605==
> ==8605== Conditional jump or move depends on uninitialised value(s)
> ==8605== at 0x3400244E55: _itoa_word (_itoa.c:195)
> ==8605== by 0x3400247A87: vfprintf (vfprintf.c:1616)
> ==8605== by 0x3400250659: printf (printf.c:35)
> ==8605== by 0x4009A0: main (main4.c:26)
> ==8605== Uninitialised value was created by a stack allocation
> ==8605== at 0x403D62: _mzd_copy_transpose_lt64x64 (mzd.c:580)
> ==8605==
> ==8605==
> ==8605== HEAP SUMMARY:
> ==8605== in use at exit: 0 bytes in 0 blocks
> ==8605== total heap usage: 55 allocs, 55 frees, 1,060,184 bytes allocated
> ==8605==
> ==8605== All heap blocks were freed -- no leaks are possible
> ==8605==
> ==8605== For counts of detected and suppressed errors, rerun with: -v
> ==8605== ERROR SUMMARY: 640 errors from 2 contexts (suppressed: 4 from 4)
>
> Here is a minimal code that produces the errors :
>
> #include <stdio.h>
> #include <stdlib.h>
> #include "m4ri.h"
>
> int main()
> {
> int r = 144;
> int l = 10;
> int i, j;
> mzd_t* A = mzd_init(r, r-l);
> mzd_t* AT = mzd_init(r-l, r);
> mzd_randomize(AT);
>
> mzd_transpose(A, AT);
>
> BIT a;
> for (i = 0; i < AT->nrows; ++i) {
> for (j = 0; j < AT->ncols; ++j) {
> a = mzd_read_bit(AT, i, j);
> printf("%d", a);
> }
> }
> for (i = 0; i < A->nrows; ++i) {
> for (j = 0; j < A->ncols; ++j) {
> a = mzd_read_bit(A, i, j);
> printf("%d", a);
> }
> }
>
> mzd_free(A);
> mzd_free(AT);
> return 0;
> }
>
>
> I link with changeset 8f08bc8469ed and I build m4ri this way :
>
> autoreconf --install
> ./configure
> CFLAGS="-g -O0" make
>
> and my code this way :
>
> gcc -o main4 -O0 -Wall -Wextra -std=gnu99 -g main4.c
> /home/ROCQ/secret/landais/Travaux/dev/m4ri/.libs/libm4ri.a
> -I /home/ROCQ/secret/landais/Travaux/dev/m4ri/src/ -I
> /home/ROCQ/secret/landais/Travaux/dev/m4ri/ -lm -lpng
>
> So this do not seem critical (since the transposition is correct) but it
> may hide something deeper.
I can confirm this behaviour and kinda identified where things go wrong. The
array t in _mzd_copy_transpose_lt64x64 is not initialised completely if n<32
because we don't need to (but apparently the code accesses uninitialised parts
anyway). We'll investigate, thanks for reporting this.
> Second point :
>
> In my program, I am trying to do "partial" full echelonization i.e. I
> want to fully echelonize my matrices but stop when a fixed number of
> columns is proccessed. I am currently patching brilliantrussian.c to
> achieve that but this is not a sustainable solution. Do you think this
> feature could be included in some way in m4ri or do you have an other
> idea to achieve what I want ?
Patching brilliantrussian.c would on give you n^3/log(n), not fully
asymptotically fast, so it's not optimal (but maybe okay for your
dimensions?). What do you mean by "stop when a fixed number of columns is
processed", do you mean you only want to deal with the first, say, C=1024
columns? In that case, I'd do something like this (which is pieced together
from PLE decomposition and RREF computaiton):
- setup a matrix window of width C, i.e.
A = [A0|A1]
- run PLUQ decomposition on
A0 -> [L\E]
- apply the transformation matrices L and P to pivot rows in A1
-> A1 = P * L^-1 * A1
- apply to transformation to non-pivot rows in A1
- do a left TRSM on E and A1 (and rest of A0) to reduce non-pivots to reduced
RREF
- apply Q to undo permutation shifts
So it's essentially like the "if (r1)" block in ple.c in line 131 + the
"if(full)" block in line 43 of echelonform.c. Let me know if that's too vague,
though.
Cheers,
Martin
--
name: Martin Albrecht
_pgp:
http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x8EF0DC99
_otr: 47F43D1A 5D68C36F 468BAEBA 640E8856 D7951CCF
_www:
http://martinralbrecht.wordpress.com/
_jab:
martinr...@jabber.ccc.de