BIG PROBLEM - runaway syscheckd process

195 views
Skip to first unread message

John A. Sullivan III

unread,
Mar 18, 2009, 12:50:47 PM3/18/09
to ossec...@googlegroups.com
Hello, all. We are suddenly having a bit of a nightmare with our
otherwise usually delightful OSSEC. We've installed it on a dual quad
core AMD server with 32GB of RAM running CentOS 5.2 but with kernel
2.6.28.7 (the CentOS kernel panics with open-iscsi) and VServer
2.3.0.36.7.

After a while, a syscheckd process spins completely out of control
consuming 100% of one processor. It refuses to die. kill does not
work, kill -9 does not work, service ossec stop does not work. Only
rebooting seems to work. The console is flooded with:
BUG: soft lockup - CPU#3 stuck for 61s! [ossec-syscheckd:4625]

The VServer host (the source of the runaway process) is an OSSEC agent.
Originally, the OSSEC server was running as one of its guests but we
thought that was the problem. We moved the OSSEC server to another
piece of hardware yet the problem has persisted.

We are using OSSEC http://www.ossec.net/files/ossec-hids-2.0.tar.gz
downloaded today. Checksum matched.

Here is the log since the last start. Notice that it thinks syscheckd
has stopped:
2009/03/18 11:55:43 ossec-execd: INFO: Started (pid: 4613).
2009/03/18 11:55:43 ossec-agentd(1410): INFO: Reading authentication keys file.
2009/03/18 11:55:43 ossec-agentd: INFO: No previous counter available for 'vserver'.
2009/03/18 11:55:43 ossec-agentd: INFO: Assigning counter for agent vserver01: '0:0'.
2009/03/18 11:55:43 ossec-agentd: INFO: Assigning sender counter: 3:3930
2009/03/18 11:55:43 ossec-agentd: INFO: Started (pid: 4617).
2009/03/18 11:55:43 ossec-agentd: INFO: Server IP Address: 172.x.x.30
2009/03/18 11:55:43 ossec-agentd: INFO: Trying to connect to server (172.x.x.30:1514).
2009/03/18 11:55:44 ossec-agentd(4102): INFO: Connected to the server (172.x.x.30:1514).
2009/03/18 11:55:47 ossec-syscheckd: INFO: Started (pid: 4625).
2009/03/18 11:55:47 ossec-rootcheck: INFO: Started (pid: 4625).
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/messages'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/secure'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/maillog'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/cron'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/messages'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/messages'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/messages'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/secure'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/secure'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/secure'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/maillog'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/maillog'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/maillog'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/basevs/var/log/cron'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/h01/var/log/cron'.
2009/03/18 11:55:49 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ns02/var/log/cron'.
2009/03/18 11:55:49 ossec-logcollector: INFO: Started (pid: 4621).
2009/03/18 11:58:28 ossec-syscheckd: Error opening directory: '/user/local/sbin': No such file or directory
2009/03/18 11:59:13 ossec-syscheckd: Error opening directory: '/vservers/ns02/user/local/sbin': No such file or directory
2009/03/18 12:01:13 ossec-syscheckd: INFO: Starting syscheck scan (db).
2009/03/18 12:09:53 ossec-syscheckd: INFO: Ending syscheck scan (db).
2009/03/18 12:10:13 ossec-rootcheck: INFO: Starting rootcheck scan.


Here is ossec.conf on the VServer host:
<ossec_config>
<client>
<server-ip>172.30.10.30</server-ip>
</client>

<syscheck>
<!-- Frequency that syscheck is executed - default to every 6 hours -->
<frequency>21600</frequency>
<alert_new_files>yes</alert_new_files>

<!-- Directories to check (perform all possible verifications) -->
<directories check_all="yes">/etc,/usr/bin,/usr/sbin</directories>
<directories check_all="yes">/bin,/sbin,/usr/local/bin,/user/local/sbin,/usr/local/etc</directories>
<directories check_all="yes">/vservers/ns02/etc,/vservers/ns02/usr/bin,/vservers/ns02/usr/sbin</directories>
<directories check_all="yes">/vservers/ns02/bin,/vservers/ns02/sbin,/vservers/ns02/usr/local/bin,/vservers/ns02/user/local/sbin,/vservers/ns02/usr/local/etc</directories>

<!-- Files/directories to ignore -->
<ignore>/etc/mtab</ignore>
<ignore>/etc/mnttab</ignore>
<ignore>/etc/hosts.deny</ignore>
<ignore>/etc/mail/statistics</ignore>
<ignore>/etc/random-seed</ignore>
<ignore>/etc/adjtime</ignore>
<ignore>/etc/httpd/logs</ignore>
<ignore>/etc/utmpx</ignore>
<ignore>/etc/wtmpx</ignore>
<ignore>/etc/cups/certs</ignore>
<ignore>/etc/dumpdates</ignore>
<ignore>/etc/svc/volatile</ignore>

<!-- Windows files to ignore -->
<ignore>C:\WINDOWS/System32/LogFiles</ignore>
<ignore>C:\WINDOWS/Debug</ignore>
<ignore>C:\WINDOWS/WindowsUpdate.log</ignore>
<ignore>C:\WINDOWS/iis6.log</ignore>
<ignore>C:\WINDOWS/system32/wbem/Logs</ignore>
<ignore>C:\WINDOWS/system32/wbem/Repository</ignore>
<ignore>C:\WINDOWS/Prefetch</ignore>
<ignore>C:\WINDOWS/PCHEALTH/HELPCTR/DataColl</ignore>
<ignore>C:\WINDOWS/SoftwareDistribution</ignore>
<ignore>C:\WINDOWS/Temp</ignore>
<ignore>C:\WINDOWS/system32/config</ignore>
<ignore>C:\WINDOWS/system32/spool</ignore>
<ignore>C:\WINDOWS/system32/CatRoot</ignore>
</syscheck>

<rootcheck>
<rootkit_files>/usr/local/ossec/etc/shared/rootkit_files.txt</rootkit_files>
<rootkit_trojans>/usr/local/ossec/etc/shared/rootkit_trojans.txt</rootkit_trojans>
<system_audit>/usr/local/ossec/etc/shared/system_audit_rcl.txt</system_audit>
<system_audit>/usr/local/ossec/etc/shared/cis_debian_linux_rcl.txt</system_audit>
<system_audit>/usr/local/ossec/etc/shared/cis_rhel_linux_rcl.txt</system_audit>
<system_audit>/usr/local/ossec/etc/shared/cis_rhel5_linux_rcl.txt</system_audit>
</rootcheck>
<!-- Files to monitor (localfiles) -->

<localfile>
<log_format>syslog</log_format>
<location>/var/log/messages</location>
</localfile>

<localfile>
<log_format>syslog</log_format>
<location>/var/log/secure</location>
</localfile>

<localfile>
<log_format>syslog</log_format>
<location>/var/log/maillog</location>
</localfile>

<localfile>
<log_format>syslog</log_format>
<location>/var/log/cron</location>
</localfile>

<localfile>
<log_format>syslog</log_format>
<location>/vservers/[a-zA-Z0-9]*/var/log/messages</location>
</localfile>

<localfile>
<log_format>syslog</log_format>
<location>/vservers/[a-zA-Z0-9]*/var/log/secure</location>
</localfile>

<localfile>
<log_format>syslog</log_format>
<location>/vservers/[a-zA-Z0-9]*/var/log/maillog</location>
</localfile>

<localfile>
<log_format>syslog</log_format>
<location>/vservers/[a-zA-Z0-9]*/var/log/cron</location>
</localfile>
</ossec_config>

Any idea what is causing this? How to kill the process without
rebooting? How to fix it?

We're starting to fall behind on this critical project so any help is
greatly appreciated. Thanks - John

--
John A. Sullivan III
Open Source Development Corporation
+1 207-985-7880
jsul...@opensourcedevel.com

http://www.spiritualoutreach.com
Making Christianity intelligible to secular society

Daniel Cid

unread,
Mar 18, 2009, 4:10:49 PM3/18/09
to ossec...@googlegroups.com, jsul...@opensourcedevel.com
Hi John,

It seems to me that syscheck got stuck while trying to read a file
(probably on the middle of a system call since
you can't kill it). On a normal environment we check if the file is
regular (and not a socket, device, etc) before we
read it. However, on a virtual environment this may be failing.

Can you try the following? Before it starts to run out of control
start strace to see what is happening. For example:

root@ourhome:/root# ps auwx |grep syscheckd
root 26897 1.4 0.0 2028 524 ? R 17:05 0:02
/var/ossec/bin/ossec-syscheckd

root@ourhome:/root# strace -F -T -p 26897

It should let us know where it will get stuck after a while...

*you may also want to redirect the output to a file "strace -F -T -p
pid > /tmp/log 2>&1"

Let us know the results and we will try to find out the issue.

Thanks,

--
Daniel B. Cid
dcid ( at ) ossec.net

John A. Sullivan III

unread,
Mar 19, 2009, 2:54:03 PM3/19/09
to Daniel Cid, ossec...@googlegroups.com
Thanks, Daniel. I have the trace but it is a 40 MB file. How shall I
send it to you? - John

Nerijus Krukauskas

unread,
Mar 20, 2009, 2:13:29 AM3/20/09
to ossec...@googlegroups.com
On 19/03/2009, John A. Sullivan III <jsul...@opensourcedevel.com> wrote:
>
> Thanks, Daniel. I have the trace but it is a 40 MB file. How shall I
> send it to you? - John

I believe that if you try to zip it, it's gonna be something around 4 MB... :)

--
http://nk99.org/

Daniel Cid

unread,
Mar 24, 2009, 10:44:21 AM3/24/09
to ossec...@googlegroups.com, jsul...@opensourcedevel.com
Yes, try zipping it and sending to the list (or directly to my email
if you think it may contain confidential
information). It will certainly help us debug this issue.

Thanks,

--
Daniel B. Cid
dcid ( at ) ossec.net

John A. Sullivan III

unread,
Mar 24, 2009, 11:48:09 AM3/24/09
to Daniel Cid, ossec...@googlegroups.com
Here it is. There is another problem. My apologies for wondering why
the list was so slow to respond. I am not receiving any email from the
list including Nerijus' response below. I only received your direct
responses, Daniel. Does one need a gmail account to use googlegroups?

In any event, here is the bzip2 file. Thanks - John

ossec.strace.bz2

John A. Sullivan III

unread,
Mar 30, 2009, 6:58:05 AM3/30/09
to Daniel Cid, ossec...@googlegroups.com
On Tue, 2009-03-24 at 11:49 -0400, John A. Sullivan III wrote:
> Here it is. There is another problem. My apologies for wondering why
> the list was so slow to respond. I am not receiving any email from the
> list including Nerijus' response below. I only received your direct
> responses, Daniel. Does one need a gmail account to use googlegroups?
>
> In any event, here is the bzip2 file. Thanks - John
>
> On Tue, 2009-03-24 at 11:44 -0300, Daniel Cid wrote:
> > Yes, try zipping it and sending to the list (or directly to my email
> > if you think it may contain confidential
> > information). It will certainly help us debug this issue.
> >
> > Thanks,
> >
> > --
> > Daniel B. Cid
> > dcid ( at ) ossec.net
> >
> > On Fri, Mar 20, 2009 at 3:13 AM, Nerijus Krukauskas
> > <nkruk...@gmail.com> wrote:
> > >
> > > On 19/03/2009, John A. Sullivan III <jsul...@opensourcedevel.com> wrote:
> > >>
> > >> Thanks, Daniel. I have the trace but it is a 40 MB file. How shall I
> > >> send it to you? - John
> > >
> > > I believe that if you try to zip it, it's gonna be something around 4 MB... :)
> > >
> > > --
> > > http://nk99.org/
> > >
Hello, all. I do have some more information on this serious bug. It
has now bitten us on two out of two vservers.

We first thought it might have to do with our use of wildcards in the
localfile definitions, e.g.,

<localfile>
<log_format>syslog</log_format>
<location>/vservers/[a-zA-Z0-9]*/var/log/maillog</location>
</localfile>

So we pulled them all out. We still had the same problem. However, it
did seem to be coincidental with not being able to find specified files.
We had mistyped some file names and paths and saw this in the error logs
before the service spun out of control:

2009/03/30 04:57:14 ossec-syscheckd: INFO: Starting syscheck scan (db).
2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.error_log'.
2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.access_log'.
2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/error'.
2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/access'.
2009/03/30 04:58:41 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/slapd-ldap01/errors'.
2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.error_log'.
2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.access_log'.
2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/error'.
2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/access'.
2009/03/30 05:00:51 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/slapd-ldap01/errors'.
2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.error_log'.
2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.access_log'.
2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/error'.
2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/access'.
2009/03/30 05:03:01 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/slapd-ldap01/errors'.
2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.error_log'.
2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.access_log'.
2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/error'.
2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/access'.
2009/03/30 05:05:11 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/slapd-ldap01/errors'.
2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.error_log'.
2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open file '/vservers/w01/var/log/httpd/ssipki.access_log'.
2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/error'.
2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/admin-serv/access'.
2009/03/30 05:07:21 ossec-logcollector(1103): ERROR: Unable to open file '/var/log/dirsrv/slapd-ldap01/errors'.
2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, ignoring it: '/vservers/w01/var/log/httpd/ssipki.error_log'.
2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, ignoring it: '/vservers/w01/var/log/httpd/ssipki.access_log'.
2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, ignoring it: '/var/log/dirsrv/admin-serv/error'.
2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, ignoring it: '/var/log/dirsrv/admin-serv/access'.
2009/03/30 05:09:32 ossec-logcollector(1904): INFO: File not available, ignoring it: '/var/log/dirsrv/slapd-ldap01/errors'.
2009/03/30 05:16:10 ossec-syscheckd: INFO: Ending syscheck scan (db).

On our second vserver, we did try wildcards in the directories
definitions. That gave us the following before spinning out of control:
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/user/local/sbin': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/etc': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/usr/bin': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/usr/sbin': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/bin': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/sbin': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/usr/local/bin': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/user/local/sbin': No such file or directory
2009/03/30 05:49:22 ossec-syscheckd: Error opening directory: '/vservers/*/usr/local/etc': No such file or directory
2009/03/30 05:51:22 ossec-syscheckd: INFO: Starting syscheck scan (db).

Having corrected the paths in the first vserver and taken out the wild
cards, it seems to be behaving itself. However, not being able to use
wild cards or regex's in the directories and localfiles definitions is
certainly inconvenient when we anticipate hundreds of virtual machines
on some of these systems.

That still leaves us with the base problem. It appears that if ossec
syscheckd encounters enough missing files, it does spin out of control
and requires a power cycle of the system to recover. Thanks - John

PS - I'm still not receiving any emails from the mail list.

John A. Sullivan III

unread,
Mar 30, 2009, 7:04:48 AM3/30/09
to Daniel Cid, ossec...@googlegroups.com
Oops! I spoke to soon. The first vserver just went out of control but
again, it is about missing files. We had defined some directories we
knew didn't have any files just in case they were populated in the
future. We would hope we could do that to prevent human error. Here is
what the logs showed before CPU usage spiked to 100%:

2009/03/30 06:22:20 ossec-syscheckd: Error opening directory: '/user/local/sbin': No such file or directory
2009/03/30 06:23:07 ossec-syscheckd: Error opening directory: '/vservers/ns02/user/local/sbin': No such file or directory
2009/03/30 06:23:57 ossec-syscheckd: Error opening directory: '/vservers/w01/user/local/sbin': No such file or directory
2009/03/30 06:25:18 ossec-syscheckd: Error opening directory: '/vservers/pg01/user/local/sbin': No such file or directory
2009/03/30 06:26:43 ossec-syscheckd: Error opening directory: '/vservers/ld01/user/local/sbin': No such file or directory
2009/03/30 06:28:43 ossec-syscheckd: INFO: Starting syscheck scan (db).

John A. Sullivan III

unread,
Mar 30, 2009, 7:10:18 AM3/30/09
to Daniel Cid, ossec...@googlegroups.com
talk about embarassment - I just noticed the typo - however, it again
emphasizes the point that ossec gets very unhappy if it can't find
something that has been defined in ossec.conf - John

John A. Sullivan III

unread,
Mar 30, 2009, 8:05:16 AM3/30/09
to Daniel Cid, ossec...@googlegroups.com

Bad news! The first vserver spun out of control again. This is with all
typos corrected and no wild cards. Here is the log since the last
reboot:

2009/03/30 07:09:44 ossec-execd: INFO: Started (pid: 5743).
2009/03/30 07:09:44 ossec-agentd(1410): INFO: Reading authentication keys file.
2009/03/30 07:09:44 ossec-agentd: INFO: No previous counter available for 'vs01'.
2009/03/30 07:09:44 ossec-agentd: INFO: Assigning counter for agent vserver01: '0:0'.
2009/03/30 07:09:44 ossec-agentd: INFO: Assigning sender counter: 6:4637
2009/03/30 07:09:44 ossec-agentd: INFO: Started (pid: 5747).
2009/03/30 07:09:44 ossec-agentd: INFO: Server IP Address: 172.x.x.30
2009/03/30 07:09:44 ossec-agentd: INFO: Trying to connect to server (172.x.x.30:1514).
2009/03/30 07:09:48 ossec-syscheckd: INFO: Started (pid: 5755).
2009/03/30 07:09:48 ossec-rootcheck: INFO: Started (pid: 5755).
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/messages'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/secure'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/maillog'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/cron'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/w01/var/log/httpd/ssipkipub.error_log'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/w01/var/log/httpd/ssipkipub.access_log'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/w01/var/log/httpd/error_log'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/w01/var/log/httpd/access_log'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/w01/var/log/httpd/ssl_error_log'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/w01/var/log/httpd/ssl_access_log'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ld01/var/log/dirsrv/admin-serv/error'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ld01/var/log/dirsrv/admin-serv/access'.
2009/03/30 07:09:50 ossec-logcollector(1950): INFO: Analyzing file: '/vservers/ld01/var/log/dirsrv/slapd-ldap01/errors'.
2009/03/30 07:09:50 ossec-logcollector: INFO: Started (pid: 5751).
2009/03/30 07:09:59 ossec-agentd(4102): INFO: Connected to the server (172.x.x.30:1514).
2009/03/30 07:19:03 ossec-syscheckd: INFO: Starting syscheck scan (db).
2009/03/30 07:38:01 ossec-syscheckd: INFO: Ending syscheck scan (db).
2009/03/30 07:38:21 ossec-rootcheck: INFO: Starting rootcheck scan.

I suppose this implies it is not about not finding files but something
specific to searching these vserver directories. They should appear as
normal file systems. I will next try it without any vserver directories

John A. Sullivan III

unread,
Mar 30, 2009, 11:01:07 AM3/30/09
to Daniel Cid, ossec...@googlegroups.com
<snip>
Argh! Even worse news. It still hangs - not a single mention of vserver
directories. As far as I can tell, this should be just like a regular
server - we are only scanning the host. No clues in the log files other
than it didn't take long to lock. Here's the log from restart:

2009/03/30 08:11:31 ossec-execd: INFO: Started (pid: 4373).
2009/03/30 08:11:31 ossec-agentd(1410): INFO: Reading authentication keys file.
2009/03/30 08:11:31 ossec-agentd: INFO: No previous counter available for 'vserver01'.
2009/03/30 08:11:31 ossec-agentd: INFO: Assigning counter for agent vserver01: '0:0'.
2009/03/30 08:11:31 ossec-agentd: INFO: Assigning sender counter: 7:3613
2009/03/30 08:11:31 ossec-agentd: INFO: Started (pid: 4377).
2009/03/30 08:11:31 ossec-agentd: INFO: Server IP Address: 172.30.10.30
2009/03/30 08:11:31 ossec-agentd: INFO: Trying to connect to server (172.30.10.30:1514).
2009/03/30 08:11:36 ossec-syscheckd: INFO: Started (pid: 4385).
2009/03/30 08:11:36 ossec-rootcheck: INFO: Started (pid: 4385).
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/messages'.
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/secure'.
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/maillog'.
2009/03/30 08:11:37 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/cron'.
2009/03/30 08:11:37 ossec-logcollector: INFO: Started (pid: 4381).
2009/03/30 08:11:46 ossec-agentd(4102): INFO: Connected to the server (172.30.10.30:1514).
2009/03/30 08:11:57 ossec-logcollector(1225): INFO: SIGNAL Received. Exit Cleaning...
2009/03/30 08:11:57 ossec-syscheckd(1225): INFO: SIGNAL Received. Exit Cleaning...
2009/03/30 08:11:57 ossec-agentd(1225): INFO: SIGNAL Received. Exit Cleaning...
2009/03/30 08:11:57 ossec-execd(1314): INFO: Shutdown received. Deleting responses.
2009/03/30 08:11:57 ossec-execd(1225): INFO: SIGNAL Received. Exit Cleaning...
2009/03/30 08:12:50 ossec-execd: INFO: Started (pid: 5438).
2009/03/30 08:12:50 ossec-agentd(1410): INFO: Reading authentication keys file.
2009/03/30 08:12:50 ossec-agentd: INFO: No previous counter available for 'vserver01'.
2009/03/30 08:12:50 ossec-agentd: INFO: Assigning counter for agent vserver01: '0:0'.
2009/03/30 08:12:50 ossec-agentd: INFO: Assigning sender counter: 7:3623
2009/03/30 08:12:50 ossec-agentd: INFO: Started (pid: 5442).
2009/03/30 08:12:50 ossec-agentd: INFO: Server IP Address: 172.30.10.30
2009/03/30 08:12:50 ossec-agentd: INFO: Trying to connect to server (172.30.10.30:1514).
2009/03/30 08:12:51 ossec-agentd(4102): INFO: Connected to the server (172.30.10.30:1514).
2009/03/30 08:12:54 ossec-syscheckd: INFO: Started (pid: 5450).
2009/03/30 08:12:54 ossec-rootcheck: INFO: Started (pid: 5450).
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/messages'.
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/secure'.
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/maillog'.
2009/03/30 08:12:56 ossec-logcollector(1950): INFO: Analyzing file: '/var/log/cron'.
2009/03/30 08:12:56 ossec-logcollector: INFO: Started (pid: 5446).
2009/03/30 08:17:46 ossec-syscheckd: INFO: Starting syscheck scan (db).
2009/03/30 08:24:22 ossec-syscheckd: INFO: Ending syscheck scan (db).
2009/03/30 08:24:42 ossec-rootcheck: INFO: Starting rootcheck scan.

Where do I look now to solve this problem? Thanks - John

Daniel Cid

unread,
Mar 30, 2009, 3:35:57 PM3/30/09
to John A. Sullivan III, ossec...@googlegroups.com
Hi John,

So, for the first issue (using wildcards), you can do if you update to
the latest snapshot:

http://www.ossec.net/files/snapshots/ossec-hids-090330.tar.gz

For the second issue, by looking at the strace output you sent and the
logs, it is being caused by
rootcheck (that does the rootkit detection) and not by syscheck.
However, rootcheck is called from
inside syscheck and that's why you are seeing the process
ossec-syscheckd going crazy.

If you want to disable rootcheck, just set <disabled> to yes under the
rootcheck configuration and this
problem should go away. Also, by looking at the strace, the CPU was
going very high during the period
of process checking, where it tries to loop through all available pids
and compare the output of
getpid, getpgid, getsid, proc and ps, looking for anomalies... So, it
was not dead of hang .


Thanks,

--
Daniel B. Cid
dcid ( at ) ossec.net

On Mon, Mar 30, 2009 at 12:01 PM, John A. Sullivan III

John A. Sullivan III

unread,
Mar 30, 2009, 3:49:51 PM3/30/09
to Daniel Cid, ossec...@googlegroups.com
Thank you, Daniel. This gives us a usable work around as we can find
other options for rootkit detection. I wonder why the process checking
was causing it such grief and so locked the systems that only a power
cycle was able to stop the runaway process.

The wildcards are great news. I see one uses sregex for ignore
directives and posix wild cards for localfiles. Shall I assume these
remain the same and we have added posix wild cards for directories
directives? - John

ch...@expensify.com

unread,
Nov 26, 2013, 10:31:37 PM11/26/13
to ossec...@googlegroups.com, Daniel Cid
Hi guys,

I'm encountering this problem on an Ubuntu 10.04 server running OSSEC 2.7.0. It even occurred after I disabled rootcheck. I do have wildcards in my syscheck directories config, but I was under the impression from the thread that that bug was fixed already, and I don't see any "No such file or directory" problems. Was there ever a formal bug opened for this syscheckd problem, or a patch that was added to specifically address this issue? Here's the log of cpu soft lockup error; the only error in the ossec log is a mail error that is expected since don't allow this server to mail out. Any ideas?

[132944.273425] BUG: soft lockup - CPU#11 stuck for 61s! [ossec-syscheckd:1604]
[132944.274714] Modules linked in: fbcon tileblit font bitblit softcursor vga16fb vgastate bonding radeon ttm psmouse drm_kms_helper drm serio_raw edac_core i2c_algo_bit edac_mce_amd i2c_piix4 joydev hpilo bnx2 power_meter lp parport usbhid hid ahci pata_atiixp cciss
[132944.274714] CPU 11:
[132944.274714] Modules linked in: fbcon tileblit font bitblit softcursor vga16fb vgastate bonding radeon ttm psmouse drm_kms_helper drm serio_raw edac_core i2c_algo_bit edac_mce_amd i2c_piix4 joydev hpilo bnx2 power_meter lp parport usbhid hid ahci pata_atiixp cciss
[132944.274714] Pid: 1604, comm: ossec-syscheckd Not tainted 2.6.32-53-server #115-Ubuntu ProLiant DL385 G7
[132944.274714] RIP: 0010:[<ffffffff812c1f9d>]  [<ffffffff812c1f9d>] copy_user_generic_string+0x2d/0x40
[132944.274714] RSP: 0018:ffff880c2ccafe40  EFLAGS: 00010246
[132944.274714] RAX: ffff880c2ccae000 RBX: ffff880c2ccafe98 RCX: 0000000000000100
[132944.274714] RDX: 0000000000000000 RSI: ffff880020000000 RDI: 00007fff62731620
[132944.274714] RBP: ffffffff81013c6e R08: ffff880001002000 R09: 00007f73a85e8700
[132944.274714] R10: 82ba700022726568 R11: 0000000000000246 R12: ffff880020000000
[132944.274714] R13: 0000000000000800 R14: 0000000000000800 R15: 0000000000000000
[132944.274714] FS:  00007f73a85e8700(0000) GS:ffff88084e480000(0000) knlGS:0000000000000000
[132944.274714] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[132944.274714] CR2: ffff880020000000 CR3: 0000000c2cd75000 CR4: 00000000000006e0
[132944.274714] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[132944.274714] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[132944.274714] Call Trace:
[132944.274714]  [<ffffffff811aa150>] ? read_kcore+0x250/0x360
[132944.274714]  [<ffffffff81053ce0>] ? __dequeue_entity+0x30/0x50
[132944.274714]  [<ffffffff8101178c>] ? __switch_to+0x1ac/0x320
[132944.274714]  [<ffffffff811a9f00>] ? read_kcore+0x0/0x360
[132944.274714]  [<ffffffff8119f271>] ? proc_reg_read+0x81/0xc0
[132944.274714]  [<ffffffff81147e85>] ? vfs_read+0xb5/0x1a0
[132944.274714]  [<ffffffff81148041>] ? sys_read+0x51/0x80
[132944.274714]  [<ffffffff81013172>] ? system_call_fastpath+0x16/0x1b


-Chris

dan (ddp)

unread,
Nov 27, 2013, 8:41:58 AM11/27/13
to ossec...@googlegroups.com
On Tue, Nov 26, 2013 at 10:31 PM, <ch...@expensify.com> wrote:
> Hi guys,
>
> I'm encountering this problem on an Ubuntu 10.04 server running OSSEC 2.7.0.
> It even occurred after I disabled rootcheck. I do have wildcards in my
> syscheck directories config, but I was under the impression from the thread
> that that bug was fixed already, and I don't see any "No such file or
> directory" problems. Was there ever a formal bug opened for this syscheckd
> problem, or a patch that was added to specifically address this issue?
> Here's the log of cpu soft lockup error; the only error in the ossec log is
> a mail error that is expected since don't allow this server to mail out. Any
> ideas?
>

You can find the source code repository at
https://bitbucket.org/jbcheng/ossec-hids
It includes a history of the commits.
> --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "ossec-list" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to ossec-list+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

ch...@expensify.com

unread,
Dec 12, 2013, 6:51:49 PM12/12/13
to ossec...@googlegroups.com
I haven't been able to find the specific issue in the commits, of course, it's a very old issue, and it's possibly a different root cause. Anyway, digging a little deeper, I discovered that the process seems to be continually reading from /proc/kcore, with no sign of ever completing. Any idea what might be going on? I'm looking for documentation on what syscheck does with kcore, but not having much luck. I did find the security readme at http://rootcheck.sourceforge.net/README.security.txt, and it indicates that rootcheck confirm that /dev/core should be a link to /proc/kcore...could this logic have been moved into the syscheck binary and it's getting stuck performing this check?

read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
read(13, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048
^CProcess 1160 detached
$ sudo lsof -P -n -p 1160 -ad 13
COMMAND    PID USER   FD   TYPE DEVICE        SIZE/OFF       NODE NAME
ossec-sys 1160 root   13r   REG    0,3 140737486266368 4026531985 /proc/kcore
$

dan (ddp)

unread,
Dec 19, 2013, 8:02:46 AM12/19/13
to ossec...@googlegroups.com
On Thu, Dec 12, 2013 at 6:51 PM, <ch...@expensify.com> wrote:
> I haven't been able to find the specific issue in the commits, of course,
> it's a very old issue, and it's possibly a different root cause. Anyway,
> digging a little deeper, I discovered that the process seems to be
> continually reading from /proc/kcore, with no sign of ever completing. Any
> idea what might be going on? I'm looking for documentation on what syscheck
> does with kcore, but not having much luck. I did find the security readme at
> http://rootcheck.sourceforge.net/README.security.txt, and it indicates that
> rootcheck confirm that /dev/core should be a link to /proc/kcore...could
> this logic have been moved into the syscheck binary and it's getting stuck
> performing this check?
>

I'm guessing /proc/kcore should just be skipped. If you can, file a
bug at https://bitbucket.org/jbcheng/ossec-hids with this info,
hopefully someone can take a look at it soon.
Reply all
Reply to author
Forward
0 new messages