I have a Python program that goes up to 100% CPU. Just like this (top):
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
80212 user1 2 44 0 70520K 16212K select 1 0:30 100.00%
/usr/local/bin/python process_updates_ss_od.py -l 10
I have added extra logs and it turns out that there are two threads. One
thread is calling "time.sleep()" and the other is calling "os.stat"
call. (Actually it is calling os.path.isfile, but I hunted down the last
link in the chain.) The most interesting thing is that the process is in
"SELECT" state. As far as I know, CPU load should be 0% because "select"
state should block program execution until the I/O completes.
I must also tell you that the os.stat call is taking long because this
system has about 7 million files on a slow disk. It would be normal for
an os.stat call to return after 10 seconds. I have no problem with that.
But I think that the 100% CPU is not acceptable. I guess that the code
is running in kernel mode. I think this because I can send a KILL signal
to it and the state changes to the following:
PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU
COMMAND
80212 user1 2 44 0 70520K 15256K STOP 5 1:27 100.00%
/usr/local/bin/python process_updates_ss_od.py -l 10
So the state of the process changes to "STOP", but the program does not
stop until the os.stat call returns back (sometimes for 30 seconds).
Could it be a problem with the operation system? Is it possible that an
os.stat call requires 100% CPU power from the OS? Or is it a problem
with the Python implementation?
(Unfortunately I cannot give you an example program. Giving an example
would require giving you a slow I/O device with millions of files on it.)
OS version: FreeBSD 8.1-STABLE amd64
Python version: 2.6.6
Thanks,
Laszlo
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
What's your operating system and file system? A better file system or
system setting may increase your performance a lot. XFS or ext3 / ext4
with hashed directory index increases the file lookup part from O(n) to
O(1).
> Could it be a problem with the operation system? Is it possible that an
> os.stat call requires 100% CPU power from the OS? Or is it a problem
> with the Python implementation?
It's mostly likely a problem with the OS, hardware and/or your
configuration. Python doesn't come with its own stat() implementation.
os.stat() just wraps the libc's stat() function. The heavy lifting is
done inside libc and the kernel.
Christian
There is a chance that the CPU usage actually comes from the thread
doing sleep(). If you have a very short sleep time, and a loop around
it, it may spin in this loop.
If it's not that, and if it's not any other unrelated application that
uses CPU that you didn't mention, then chances are high that it's indeed
the file system code of your operating system that consumes that much
CPU time.
Regards,
Martin
Thank you,
Laszlo
> So the state of the process changes to "STOP", but the program does not
> stop until the os.stat call returns back (sometimes for 30 seconds).
>
> Could it be a problem with the operation system? Is it possible that an
> os.stat call requires 100% CPU power from the OS? Or is it a problem
> with the Python implementation?
It's the OS kernel. If it was Python or the C library, sending SIGKILL
would result in immediate termination.
Is the disk interface operating in PIO mode? A slow disk shouldn't cause
100% CPU consumption; the OS would just get on with something else (or
just idle) while waiting for data to become available. But if it's
having to copy data from the controller one word at a time, that could
cause it (and would also make the disk appear slow).
vfs.ufs.dirhash_maxmem: 134217728
> It's mostly likely a problem with the OS, hardware and/or your
> configuration. Python doesn't come with its own stat() implementation.
> os.stat() just wraps the libc's stat() function. The heavy lifting is
> done inside libc and the kernel.
Great, so the I should ask the BSD list about this.
I'll first try to rewrite the code and replace that some million files
with a very big hash database (gdbm). I guess that will be faster.
Thanks.
Laszlo
I'm working about a different storage method. We will be storing these
logs in a real database instead of separate CSV files. So probably this
problem will cease.
Thanks,
Laszlo