Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bug#1034816: aide aborts with error "realloc: failed to allocate memory", exit code 22

45 views
Skip to first unread message

Thomas Dorner

unread,
Apr 25, 2023, 5:10:04โ€ฏAM4/25/23
to
Package: aide
Version: 0.18.2-1
Severity: important
X-Debbugs-Cc: debia...@th-dorner.de

Dear Maintainer (that's you Marc, isn't it ;-),

The last two daily aide runs on my desktop machine failed with an error
22. Today's log-file shows:

aide run on XXXXX.XXXXX started at 2023-04-25 06:38:52.
******************************************************************************
* AIDE returned with exit code 22. unknown non-zero exit value 22 *
******************************************************************************
AIDE error output (465 lines):
WARNING: hash calculation: '/run/user/1065/dconf/user' has been changed (changed attributes: c+m+i, hash could not be calculated)
WARNING: hash calculation: '/var/backups/apt.extended_states.3.gz' has been changed (changed attributes: s+c+m+i, hash could not be calculated)
WARNING: hash calculation: '/var/backups/apt.extended_states.6.gz' has been changed (changed attributes: s+c+m+i, hash could not be calculated)
WARNING: hash calculation: '/var/backups/apt.extended_states.4.gz' has been changed (changed attributes: s+c+m+i, hash could not be calculated)
WARNING: hash calculation: '/var/backups/apt.extended_states.1.gz' has been changed (changed attributes: s+c+m+i, hash could not be calculated)
WARNING: hash calculation: '/var/backups/apt.extended_states.2.gz' has been changed (changed attributes: s+c+m+i, hash could not be calculated)
WARNING: hash calculation: '/var/backups/apt.extended_states.5.gz' has been changed (changed attributes: s+c+m+i, hash could not be calculated)
WARNING: hash calculation: '/var/backups/apt.extended_states.0' has been changed (changed attributes: i, hash could not be calculated)
WARNING: hash calculation: '/var/lib/systemd/timers/stamp-chkrootkit.timer' has been changed (changed attributes: c+m, hash could not be calculated)
WARNING: hash calculation: '/var/log/chkrootkit/log.today' has been changed (changed attributes: c+m, hash could not be calculated)
WARNING: hash calculation: '/var/log/chkrootkit/chkrootkit-daily.log' has been changed (changed attributes: c+m, hash could not be calculated)
WARNING: hash calculation: '/var/log/chkrootkit/log.today.raw' has been changed (changed attributes: c+m, hash could not be calculated)
[...some more normal warnings...]
WARNING: tried to read access ACL on /var/opt/lxc/no-network/usr/share/doc/libgpg-error0 but failed with: No such file or directory
WARNING: tried to read default ACL on /var/opt/lxc/no-network/usr/share/doc/libgpg-error0 but failed with: Invalid argument
WARNING: listxattrs failed for /var/opt/lxc/no-network/usr/share/doc/libgpg-error0:No such file or directory
WARNING: lgetfilecon_raw failed for /var/opt/lxc/no-network/usr/share/doc/libgpg-error0: No such file or directory
ERROR: realloc: failed to allocate memory
End of AIDE error output
AIDE detected no changes.

End of AIDE daily aide check at 2023-04-25 08:13, run time 5672 seconds
funny, AIDE did not leave a log.

Version 0.18.2-1 had been installed on 2023-04-21, so it did run OK at
least two times. It also run OK after a manual "systemctl start
dailyaidecheck" in a terminal window yesterday. This did not work today
though.

The last warnings like the 4 last ones above all come from a test
directory used by my current project. The files and directories there
have been deleted and recreated several times during the aide run.

Memory of the machine should be enough with 32 GB mostly used as cache:

MiB Mem: 32025.9 total, 1145.8 free, 3102.1 used, 28268.6 buff/cache

I think I've always be working on the project during the failures, so
it maybe a reallocation for a deleted or recreated entry. Currently
I've no further idea to narrow it down.

The directory /var/lib/aide/dailyaidecheck with the temporary log-files
has been removed, but maybe I can modify something to keep it to maybe
get further info.


-- System Information:
Debian Release: 12.0
APT prefers unstable
APT policy: (500, 'unstable')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 6.1.0-8-amd64 (SMP w/16 CPU threads; PREEMPT)
Kernel taint flags: TAINT_CPU_OUT_OF_SPEC
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages aide depends on:
ii libacl1 2.3.1-3
ii libaudit1 1:3.0.9-1
ii libc6 2.36-9
ii libcap2 1:2.66-3
ii libext2fs2 1.47.0-2
ii libmhash2 0.9.9.9-9
ii libpcre2-8-0 10.42-1
ii libselinux1 3.4-1+b5
ii zlib1g 1:1.2.13.dfsg-1

Versions of packages aide recommends:
ii aide-common 0.18.2-1

Versions of packages aide suggests:
pn figlet <none>

-- no debconf information

Hannes von Haugwitz

unread,
Apr 25, 2023, 4:40:06โ€ฏPM4/25/23
to
Hi Thomas,

On Tue, Apr 25, 2023 at 10:54:39AM +0200, Thomas Dorner wrote:
> The last two daily aide runs on my desktop machine failed with an error
> 22.

How many files are in the AIDE database on a successful run? Does this
number significantly differ when the aide check fails?

> Version 0.18.2-1 had been installed on 2023-04-21, so it did run OK at
> least two times. It also run OK after a manual "systemctl start
> dailyaidecheck" in a terminal window yesterday. This did not work today
> though.

Is 0.18.2-1 the only version you experience this behaviour or does this
error also occur with an older version?

> The last warnings like the 4 last ones above all come from a test
> directory used by my current project. The files and directories there
> have been deleted and recreated several times during the aide run.

Independently of the issue above, it might make sense to exclude this
directory.

Best regards

Hannes

Thomas Dorner

unread,
Apr 26, 2023, 2:00:05โ€ฏAM4/26/23
to
Hello Hannes,

thanks for the quick response.

> How many files are in the AIDE database on a successful run? Does this
> number significantly differ when the aide check fails?

You mean the /var/lib/aide/aide.db?
# zcat /var/lib/aide/aide.db | wc
755240 21146627 442199792

The /var/lib/aide/aide.db after a failed one is always size 0.

> Is 0.18.2-1 the only version you experience this behaviour or does
> this error also occur with an older version?

I've never encountered this before, but I did not work with the
specific directory tree parallel to the AIDE run for at least 3 weeks
before the this one.

> Independently of the issue above, it might make sense to exclude this
> directory.

That's the work-around I'll put in place till you try something
different.

Best regards, Thomas
--
๐“ฃ๐“ฑ๐“ธ๐“ถ๐“ช๐“ผ ๐““๐“ธ๐“ป๐“ท๐“ฎ๐“ป

Hannes von Haugwitz

unread,
Apr 26, 2023, 2:10:04โ€ฏPM4/26/23
to
Hello Thomas,

On Wed, Apr 26, 2023 at 07:46:40AM +0200, Thomas Dorner wrote:
> > How many files are in the AIDE database on a successful run? Does this
> > number significantly differ when the aide check fails?
>
> You mean the /var/lib/aide/aide.db?
> # zcat /var/lib/aide/aide.db | wc
> 755240 21146627 442199792

This shouldn't be large enough to fill up 32 GB of memory.

Can you try to reproduce the failure and verify that the memory is
actually used up by the aide process?

> > Is 0.18.2-1 the only version you experience this behaviour or does
> > this error also occur with an older version?
>
> I've never encountered this before, but I did not work with the
> specific directory tree parallel to the AIDE run for at least 3 weeks
> before the this one.

Additionally can you try to directly call aide limited to the specific
directory (see --limit option).

Best regards

hannes

Thomas Dorner

unread,
Apr 28, 2023, 12:00:04โ€ฏPM4/28/23
to
Hello Hannes!

> Can you try to reproduce the failure and verify that the memory is
> actually used up by the aide process?

Well today I run it with dmalloc. I'll need some time to handle the 28
GB of log-file, especially as it apparently contains at least one large
chunk of 0-bytes.

> Additionally can you try to directly call aide limited to the specific
> directory (see --limit option).

That probably doesn't give me enough time to delete the directory after
starting aide, but before aide actually gets there.

I'll follow up on this when I know more.
Best regards, Thomas
--
๐“ฃ๐“ฑ๐“ธ๐“ถ๐“ช๐“ผ ๐““๐“ธ๐“ป๐“ท๐“ฎ๐“ป

Thomas Dorner

unread,
Apr 29, 2023, 7:40:04โ€ฏAM4/29/23
to
> Well today I run it with dmalloc. I'll need some time to handle the
> 28 GB of log-file, especially as it apparently contains at least one
> large chunk of 0-bytes.

OK, so that log file looks inconspicuous. The reallocations are a few
various followed by one series of one permanently growing big one (up
to 1792464 bytes; maybe reading in the DB into the internal
representation?), a growing small one (growing in 4 bytes steps up to
112 bytes), a huge gap without any reallocations, lots (~55% of all) of
reallocations from 257 to something smaller and 4 ones from 99 bytes to
190 bytes). Unfortunately that wasn't the end of the log file, there
were still ~2900 other memory actions after it.ยน

I'll try something different when the failure occurs again - today it
didn't (maybe my timing was wrong), so it probably will be a few days.

Best regards, Thomas
ยน My (maybe completely wrong ;-) theory would be that dmalloc only logs
after the call returned and it failed in the call itself. (I logged
using the tags log-trans and log-stats aka 0x9.)
--
๐“ฃ๐“ฑ๐“ธ๐“ถ๐“ช๐“ผ ๐““๐“ธ๐“ป๐“ท๐“ฎ๐“ป

Thomas Dorner

unread,
May 10, 2023, 3:51:15โ€ฏAM5/10/23
to
Dear maintainers,

after some unsuccessful tries to monitor the failed call to realloc I
finally took the sources and added some monitoring into the aide binary
itself. (Yes, I'm writing about classic dirty printf debugging. ;-)

And today I was successful, the problem is a call to checked_realloc in
util.c with a request of 0 bytes. Before that the log contained a few
warnings about a missing sub-directory (sys) of the missing upper
directory (no-network).

WARNING: tried to read access ACL on /var/opt/lxc/no-network/sys but failed with: No such file or directory
WARNING: tried to read default ACL on /var/opt/lxc/no-network/sys but failed with: Invalid argument
WARNING: listxattrs failed for /var/opt/lxc/no-network/sys:No such file or directory
WARNING: lgetfilecon_raw failed for /var/opt/lxc/no-network/sys: No such file or directory
checked_realloc(0x7fe3c9145380, 0)
failed to reallocate 0 bytes
ERROR: realloc: failed to allocate memory
End of AIDE error output

A first glance at all the calls to checked_realloc makes me guess this
can only happen in one of the calls in do_md.c, all others should be >0.

Hope this helps & best regards, Thomas
--
๐“ฃ๐“ฑ๐“ธ๐“ถ๐“ช๐“ผ ๐““๐“ธ๐“ป๐“ท๐“ฎ๐“ป

Thomas Dorner

unread,
May 11, 2023, 12:00:05โ€ฏPM5/11/23
to
Dear maintainers,

I narrowed it further down with some more fprintfs. The problem is not
in do_md.c but the call in hsymlnk in gen_list.c. Here the critical
part of the output:

gen_list.c: hsymlnk: '/var/opt/lxc/no-network/lib64', ''
gen_list.c: hsymlnk: len+1 == 0
checked_realloc(0x7fe4f1117290, 0)
failed to reallocate 0 bytes
ERROR: realloc: failed to allocate memory

The variables in the first fprintf are line->fullpath and
line->linkname.

At the beginning of the aide run /var/opt/lxc/no-network/lib64 is the
following symbolic link (basically an identical copy of the root
/lib64):

lrwxrwxrwx 1 ... /var/opt/lxc/no-network/lib64 -> usr/lib64/

Later (about 30 minutes after the start of dailyaidecheck) the whole
"no-network" tree is deleted. This apparently leads to an empty
linkname, readlink returns -1, which then results in a call to realloc
with size 0.

(I think the rest is now up to you. ;-)

Best regards, Thomas
--
๐“ฃ๐“ฑ๐“ธ๐“ถ๐“ช๐“ผ ๐““๐“ธ๐“ป๐“ท๐“ฎ๐“ป

Hannes von Haugwitz

unread,
May 11, 2023, 12:10:06โ€ฏPM5/11/23
to
tags 1034816 - moreinfo unreproducible
thanks

Hi Thomas,

On Thu, May 11, 2023 at 05:52:01PM +0200, Thomas Dorner wrote:
> I narrowed it further down with some more fprintfs. The problem is not
> in do_md.c but the call in hsymlnk in gen_list.c.

Yes, yesterday I was able to reproduce your issue.

Please try the patch available upstream[0] and report back if it fixes
the memory allocation errors.

Thanks for debugging.

Best regards

Hannes

[0] https://github.com/aide/aide/commit/61778cdb42b88ab9591e43bf8de39693d545a278

Thomas Dorner

unread,
May 12, 2023, 4:31:06โ€ฏAM5/12/23
to
Hi Hannes!

> Please try the patch available upstream[0] and report back if it fixes
> the memory allocation errors.

Looks good:

[...]
WARNING: lgetfilecon_raw failed for /var/opt/lxc/no-network: No such file or directory
WARNING: tried to read access ACL on /var/opt/lxc/no-network/sys but failed with: No such file or directory
WARNING: tried to read default ACL on /var/opt/lxc/no-network/sys but failed with: Invalid argument
WARNING: listxattrs failed for /var/opt/lxc/no-network/sys:No such file or directory
WARNING: lgetfilecon_raw failed for /var/opt/lxc/no-network/sys: No such file or directory
WARNING: readlink() failed for '/var/opt/lxc/no-network/lib64': No such file or directory
WARNING: listxattrs failed for /var/opt/lxc/no-network/lib64:No such file or directory
WARNING: lgetfilecon_raw failed for /var/opt/lxc/no-network/lib64: No such file or directory
WARNING: readlink() failed for '/var/opt/lxc/no-network/lib32': No such file or directory
WARNING: listxattrs failed for /var/opt/lxc/no-network/lib32:No such file or directory
WARNING: lgetfilecon_raw failed for /var/opt/lxc/no-network/lib32: No such file or directory
WARNING: readlink() failed for '/var/opt/lxc/no-network/bin': No such file or directory
WARNING: listxattrs failed for /var/opt/lxc/no-network/bin:No such file or directory
WARNING: lgetfilecon_raw failed for /var/opt/lxc/no-network/bin: No such file or directory
[...]

> Thanks for debugging.

You're welcome.

Best regards, Thomas
--
๐“ฃ๐“ฑ๐“ธ๐“ถ๐“ช๐“ผ ๐““๐“ธ๐“ป๐“ท๐“ฎ๐“ป
0 new messages