RXTX serial port library causes high CPU load

852 views
Skip to first unread message

Bernd Pfrommer

unread,
Oct 14, 2014, 6:26:35 PM10/14/14
to ope...@googlegroups.com
First a short write-up on performance issues I encountered with the nrjavaserial (RXTX) serial library on unix based hardware:

I noticed that the InsteonPLM binding (which uses the serial port library) was consuming about 20-30% CPU on an Intel Atom, in other words: excessive considering how little traffic it has to handle. Turns out the nrjavaserial library that provides the RXTX implementation is actually to be blamed, because it is polling in two places:

1) The moment a port is opened, a "RXTX Monitor Thread" is started. It is supposed to provide callbacks in case an EventListener is registered (they don't delay starting the thread until a listener is actually registered!). Instead of peacefully select()ing on a set of file descriptors, it selects() with a timeout of 1000us, meaning the thread wakes up 1000 times per sec, for no good reason.

2) If a blocking read() is called, the underlying native code (read_byte_array()) may also call select() with a non-null timeout parameter. Explicitly setting a timeout like this:

    port.disableReceiveThreshold();
    port.enableReceiveTimeout(10000);

    seemed to improve that problem.


Any of the bindings using the RXTX serial framework are affected by issue 1), and the CPU load multiplies with the number of ports opened (e.g. urtsi + insteonplm binding).

Not sure what to do about it. I tried fixing the nrjavaserial library but that's difficult because the native libraries will need to be compiled on a slew of different platforms to which I have no access.
An alternative is moving to the original, old RXTX library, which also has a timeout on the select call mentioned in 1), but it's 100 times larger and should be less CPU hungry.

Any suggestions on how to address this?

Thanks!

Davy Vanherbergen

unread,
Oct 15, 2014, 5:11:48 AM10/15/14
to ope...@googlegroups.com
I've been using the nrjavaserial library in a binding for a long time now without any issues or high memory consumption, so the problem your seeing doesn't affect all users.
I suggest you raise an issue at the nrjavaserial project itself: https://github.com/NeuronRobotics/nrjavaserial/issues
They have been very helpful in the past to get issues resolved.

Davy

Bernd Pfrommer

unread,
Oct 15, 2014, 6:33:25 AM10/15/14
to ope...@googlegroups.com
I have raised the issue with the nrjavaserial folks. Turns out this issue had been discovered before, but not fixed.

What OS are you running on? This only affects Unix based systems that use the select() system call. It doesn't affect memory consumption either, just CPU consumption. And the effect won't be terrible, but just annoying enough to notice. On my Intel I7 running a single binding, CPU usage is only 5%. Without the bug it's 1.5%.

But I just had an idea how to work around it: maybe registering a listener and then putting the monitoring thread to sleep() will do.

Paul Hampson

unread,
Oct 15, 2014, 12:34:14 PM10/15/14
to ope...@googlegroups.com
I've been wondering for a while now what has been making my server (HP Microserver N54L) sit with about 25% load all the time with openhab running (you can see it in `htop`). Maybe this is the answer.

If you can resolve it, it would be very helpful. If I get a chance and you haven't resolved it already I'll have a look myself as well.

Thanks!
Paul

Bernd Pfrommer

unread,
Oct 15, 2014, 12:36:40 PM10/15/14
to ope...@googlegroups.com
Paul,

What bindings are you running?

Davy Vanherbergen

unread,
Oct 17, 2014, 3:19:20 AM10/17/14
to ope...@googlegroups.com
What OS are you running on? This only affects Unix based systems that use the select() system call. It doesn't affect memory consumption either, just CPU consumption. And the effect won't be terrible, but just annoying enough to notice. On my Intel I7 running a single binding, CPU usage is only 5%. Without the bug it's 1.5%.


I'm using Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-35-generic x86_64) and I'm not seeing any continuous CPU consumption for openHAB.
Maybe it has to do with the way the bindings are implemented? The only binding which I'm using that uses nrjavaserial is the nikobusbinding.

Davy

Paul Hampson

unread,
Oct 17, 2014, 5:56:33 AM10/17/14
to ope...@googlegroups.com
org.openhab.binding.mqttitude-1.5.1.jar        
org.openhab.io.transport.mqtt_1.6.0.201410110101.jar
org.openhab.action.prowl-1.5.0.jar  
org.openhab.binding.networkhealth-1.5.0.jar       
org.openhab.persistence.gcal-1.5.0.jar
org.openhab.binding.astro-1.5.0.jar  
org.openhab.binding.ntp-1.5.0.jar       
org.openhab.binding.benqprojector_1.6.0.201409271640.jar  
org.openhab.binding.rfxcom-1.5.0.jar       
org.openhab.binding.exec-1.5.0.jar   
org.openhab.binding.wol-1.5.0.jar       
org.openhab.persistence.rrd4j-1.5.0.jar
org.openhab.binding.http-1.5.0.jar  
org.openhab.binding.zwave-1.6.0-SNAPSHOT.jar
org.openhab.binding.jeelinkha_1.6.0.201407252333.jar  
org.openhab.io.habmin-1.5.0-SNAPSHOT.jar
org.openhab.binding.maxcul_1.6.0.201410060001.jar  
org.openhab.io.myopenhab-1.4.0-SNAPSHOT.jar
org.openhab.binding.mqtt-1.5.1.jar  
org.openhab.io.transport.cul_1.6.0.201410052310.jar

The ones that utilse serial (all virtual serial ports over USB i believe) are:

maxcul (developed my me) via CUL transport
zwave
rfxcom
jeelinkha (developed by me, not pushed to main openhab release)

Any crossover?

Thanks,
Paul


--
You received this message because you are subscribed to a topic in the Google Groups "openhab" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/openhab/Ms4K1NCphbc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to openhab+u...@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at http://groups.google.com/group/openhab.
For more options, visit https://groups.google.com/d/optout.

Bernd Pfrommer

unread,
Oct 17, 2014, 6:17:23 AM10/17/14
to ope...@googlegroups.com
The binding hooks up a SerialPortEventListener, so at least it avoids the blocking read(), which should cut the CPU load in half. But there should still be CPU leakage.

So you are saying you see zero, absolutely zero CPU load? Check with "top". Even without any bindings running there should be some amount of CPU load (1.5%) due to the polling for file status (.items files etc).

Davy Vanherbergen

unread,
Oct 17, 2014, 6:30:44 AM10/17/14
to ope...@googlegroups.com
Yep, zero continuous load for openHAB. The java process running openhab doesn't even show up most of the time in top:

Tasks: 257 total,   2 running, 255 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  3.2 sy,  0.0 ni, 95.7 id,  0.2 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   7770152 total,  7400640 used,   369512 free,   264640 buffers
KiB Swap:  7970812 total,     7016 used,  7963796 free.  1562584 cached Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
31605 davy      20   0   30656   1752   1168 R   0.7  0.0   0:00.26 top
  452 root       0 -20       0      0      0 S   0.3  0.0   7:39.11 txg_sync
 1214 root      20   0  191448   2068   1488 S   0.3  0.0   2:28.42 nmbd
 1269 www-data  20   0   90568   1788    468 S   0.3  0.0   6:02.12 nginx
    1 root      20   0   37212   6596   1528 S   0.0  0.1   0:05.69 init
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.30 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   1:10.26 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:+
    7 root      20   0       0      0      0 R   0.0  0.0  16:11.75 rcu_sched
    8 root      20   0       0      0      0 S   0.0  0.0  14:36.97 rcuos/0
    9 root      20   0       0      0      0 S   0.0  0.0  16:02.65 rcuos/1
   10 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
   11 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/0
   12 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcuob/1
   13 root      rt   0       0      0      0 S   0.0  0.0   0:15.82 migration/0
   14 root      rt   0       0      0      0 S   0.0  0.0   0:38.97 watchdog/0
   15 root      rt   0       0      0      0 S   0.0  0.0   0:37.46 watchdog/1

Davy Vanherbergen

unread,
Oct 17, 2014, 6:33:05 AM10/17/14
to ope...@googlegroups.com
FYI, the file polling runs only once every 30 seconds on my machine. There is no need to do that continuously.

folder:items=30,items
folder:sitemaps=30,sitemap
folder:rules=30,rules
folder:scripts=30,script
folder:persistence=30,persist

Bernd Pfrommer

unread,
Oct 17, 2014, 6:58:43 AM10/17/14
to ope...@googlegroups.com
Had a look at the source:

ZWave and RFXCom use blocking read(), not a SerialPortEventListener, and hence should be fully affected.
The cul transport uses a SerialPortEventListener, which still has some CPU leakage, but less than using blocking read().
If you have 4 serial ports open, this should give you about 15-20% CPU usage on a recent I7, and more like 40% on an Atom.

I'm surprised nobody has noticed that on low-end hardware like a Raspberry Pi. Hard to imagine it wouldn't rear its head there...

Chris Jackson

unread,
Oct 18, 2014, 8:28:56 AM10/18/14
to ope...@googlegroups.com
Bernd,
Thanks for pointing this out - this is definitely interesting. Looking around the web, there seems to be a lot of 'unhappyness' about RXTX. I must admit that the port driver side isn't something I've looked at in the zwave binding, but given this, and the RPi duplication problem with zwave (which is possibly an issue in the serial driver) I might take a look at an alternative library for zwave....

Any thoughts? There seem to be a few out there...

Chris

Davy Vanherbergen

unread,
Oct 18, 2014, 8:39:58 AM10/18/14
to ope...@googlegroups.com
Chris,

Before changing libs, wouldn't it make more sense to first change the zwave binding to not use a blocking read() anymore?

Davy

--

Chris Jackson

unread,
Oct 18, 2014, 8:43:28 AM10/18/14
to ope...@googlegroups.com
Maybe - but there’s also a long standing issue with zwave on the RPi which we’ve not been able to solve. It’s almost certainly not in the binding, so that leaves the library.

Also, if I understand Bernd’s comments, this is only half of the problem since the library runs every millisecond to service events (maybe I misunderstood that bit?).

Chris

Davy Vanherbergen

unread,
Oct 18, 2014, 8:54:32 AM10/18/14
to ope...@googlegroups.com
Using an event listener rather than a blocking read will be much more efficient. I'm using the lib 24/7 with event listeners in the nikobusbinding and haven't seen any CPU leakage yet.

Davy

Bernd Pfrommer

unread,
Oct 18, 2014, 9:02:07 AM10/18/14
to ope...@googlegroups.com
Chris,

Mind pointing to the RPi duplication issue? I can have a look at it.

The serial port implementation is inherently messy because it requires the use of JNI. Was this a reason Sun/Oracle decided to not support it anymore? Don't know.

I looked at the code base (mostly the native part, written in c) of the nrjavaserial fork of RXTX. It seems a lot of people have hacked around in it for a prolonged time, with nobody really feeling responsible for it. A good deal of the ugliness was inherited from the original RXTX code base. There are other libraries out there, but they are probably not ported to as many devices.  That's always an issue with native code: even if you manage to set up cross compilation, you have to have access to a bunch of different platforms for testing.

Just for kicks I tried to pull in my own serial library build, along the original one, but couldn't figure out how to put two different libraries exporting identical symbols into the serial transport bundle. If we do move to a different library we need an Eclipse/Maven/Build Environment wiz to come in and configure a second serial library for us so we can transition binding-by-binding.

Bernd Pfrommer

unread,
Oct 18, 2014, 9:21:50 AM10/18/14
to ope...@googlegroups.com
Davy,

It really beats my why you don't see CPU leakage, especially since you appear to be running on an identical system like me. I see leakage even for the URTSI binding that uses only the eventListener.


First, in the function initialise_event_info_struct( struct event_info_struct *eis ),
The sleep time for the select call is set up like this:

FD_ZERO( &eis->rfds );
if (eis->fd < FD_SETSIZE && eis->fd > 0) {
FD_SET( eis->fd, &eis->rfds );
eis->tv_sleep.tv_sec = 0;
eis->tv_sleep.tv_usec = 1000;
eis->initialised = 1;
return( 1 );
} else {
// you can reduce this limitation only with migration to epool or something like that.
}


Then in the eventloop, it is used:

JNIEXPORT void JNICALL RXTXPort(eventLoop)( JNIEnv *env, jobject jobj )
{
#ifdef WIN32
int i = 0;
#endif /* WIN32 */
struct event_info_struct eis;
eis.jclazz = (*env)->GetObjectClass( env, jobj );
eis.env = env;
eis.jobj = &jobj;
eis.initialised = 0;

ENTER( "eventLoop\n" );
if ( !initialise_event_info_struct( &eis ) ) goto end;
if ( !init_threads( &eis ) ) goto end;
unlock_monitor_thread( &eis );
do{
report_time_eventLoop( );
do {
/* nothing goes between this call and select */
if( eis.closing )
{
report("eventLoop: got interrupt\n");
finalize_threads( &eis );
finalize_event_info_struct( &eis );
LEAVE("eventLoop");
return;
}
#ifndef WIN32
/* report( "." ); */
do {
eis.ret = SELECT( eis.fd + 1, &eis.rfds, NULL, NULL,
&eis.tv_sleep );
} while (eis.ret < 0 && errno==EINTR);
#else
/*
   termios.c:serial_select is instable for some
   reason

   polling is not blowing up.
*/
/*
usleep(5000);
*/
eis.ret=1;
while( i++ < 5 )
{
if(eis.eventflags[SPE_DATA_AVAILABLE] )
{
if( port_has_changed_fionread( &eis ) )
{
send_event( &eis, SPE_DATA_AVAILABLE, 1 );
}
}
usleep(1000);
}
i = 0;
#endif /* WIN32 */
}  while ( eis.ret < 0 && errno == EINTR );
if( eis.ret >= 0 )
{
report_serial_events( &eis );
}
if ( !initialise_event_info_struct( &eis ) ) goto end;
} while( 1 );
end:
LEAVE( "eventLoop:  Bailing!\n" );
}


Why the select() on your machine should not wake up 1000 times a second I don't understand.

BTW, this issue has been raised before by others:



As I said in my other post, we should be cautious moving to a different serial library lest we break working systems.

Chris Jackson

unread,
Oct 18, 2014, 9:48:00 AM10/18/14
to ope...@googlegroups.com
Hi Bernd,
Do you have an RPi? It would be great if you could take a look…

I’m not sure though if there’s an actual issue raised on it, but there’s been a lot of discussion on the list. I’ve just created issue 1564 for this as it keeps coming up, and each time I have to search for the links. If you look at some of the logs it’s quickly obvious that there’s a lot of duplication - especially when you look at the startup messages where the node information is initially received. Basically, it seems that a single message is sent to the Pi, but we get multiple responses… It only happens on the Pi, and I put extra debug in the binding to make sure there wasn’t anything going on there, and it was clean...

It shouldn’t be too hard to move the binding to a different build of a lib - I’ve done it a few times, but there are a couple of places that you need to change things (and I’d need to remind myself where they are!).

Chris


Bernd Pfrommer

unread,
Oct 18, 2014, 5:44:22 PM10/18/14
to ope...@googlegroups.com
Chris,

I admit to having two pis, but they are preconfigured appliances, I'm loathe to start installing stuff on them.

Replacing the serial library is not hard. What I couldn't do was configure a second one alongside the original, so one binding could use the old one, another one the new one. There is potential switching to a new library will break stuff for people. Davy for instance seems to be completely happy with the current one:)

Bernd
Message has been deleted

Davy Vanherbergen

unread,
Oct 22, 2014, 5:51:47 AM10/22/14
to ope...@googlegroups.com
(Sorry if you are getting this multiple times. Seems my post got stuck in a spam filter somewhere)

Could it be that the leakage is too small to notice in top? I'm currently running on an 
E45M1I_DELUXE board with 8Gb of ram. 

When I check my VM using jvisualvm, the 'Nikobus Receiver' thread is pretty static in it's CPU consumption, i.e. it doesn't consume any CPU unless something comes across the serial port. Do you see the leaking CPU usage in jvisualvm (Sample > CPU > Thread CPU time) under the originator java thread which started the listener/read(), or is it not reported there?

Davy

Pauli Anttila

unread,
Dec 14, 2014, 8:55:07 AM12/14/14
to ope...@googlegroups.com
I have always wondered the cpu consumption of my system, but never had time to really focus on that. I run both rfxcom and epson projector bindings which both use blocking read. I just tested you (Bernd) idea, where eventListener is introduced with infinitive sleep and it seems to work. CPU consumption is much slower. It was around 20% before and now it below 10%.

--
You received this message because you are subscribed to the Google Groups "openhab" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openhab+u...@googlegroups.com.

Chris Jackson

unread,
Jan 10, 2015, 5:14:15 AM1/10/15
to ope...@googlegroups.com
For info, I added the same fix/change to the zwave binding and it definitely makes a big improvement to the the CPU performance.

Thanks :)

Chris

Robert Shepard

unread,
Jan 12, 2015, 2:10:33 PM1/12/15
to ope...@googlegroups.com
I can't help much here but I have a problem with serial port as well. My problem is the zwave binding fails on start up about 70% of the time.  I have to continually restart until, magically, the binding start correctly.  I have mentioned this before, thought I had it resolved, but nothing has worked consistently.  i have tried to pinpoint a cause, but nothing makes any difference. 

I am running fedora with openjava 1.7.   I don't think this plays a role.

I have both the insteonplm binding and the zwavebinding.  the insteon binding always works,  the zwave binding as mentioned, works occaisionally. The problem is not CPU load, in fact I dont' have that problem at all.  I use this -Dgnu.io.rxtx.SerialPorts=/dev/zwaveUSB  in the java start script.  NOTE that I do NOT add the insteon serial port to this.  IT doesn't appear to need this directive and indeed, it fails if I add the insteon port to the rxtx serial port statement.

The zwave problem is definitely a locking of the port problem.  When it fails to connect to the serial port correctly, I get this messages

TX Warning:  Removing stale lock file. /var/lock/LCK..insteonUSB
2015-01-08 09:21:20.614 [INFO ] [.service.AbstractActiveService] - ZWave Refresh Service has been started
RXTX fhs_lock() Error: opening lock file: /var/lock/LCK..insteonUSB: File exists. It is mine

���� testRead() Lock file failed

Initially, I thought this was the insteon binding.  BUT, it isn't.  when the zwave binding starts correctly, then I get this series in the logs:
2015-01-08 09:25:13.406 [INFO ] [.service.AbstractActiveService] - Exec Refresh Service has been started
2015-01-08 09:25:13.446 [INFO ] [.service.AbstractActiveService] - ZWave Refresh Service has been started
2015-01-08 09:25:13.557 [INFO ] [.service.AbstractActiveService] - HTTP Refresh Service has been started
2015-01-08 09:25:13.710 [INFO ] [.service.AbstractActiveService] - InsteonPLM has been started
RXTX Warning:  Removing stale lock file. /var/lock/LCK..insteonUSB
2015-01-08 09:25:13.851 [INFO ] [.service.AbstractActiveService] - NetworkHealth Refresh Service has been started

Note that the insteon binding correctly detects the existing lock file, recognizes it as its own, and removes it and establishes a new lock.

the zwave binding never accomplishes this and I have to manually remove the lock file if it exists.  However, even when the does not exist, the zwave binding fails with the above log.

This is not a permission issue (because the insteon binding has no permission issues, I am running as root, and I have root:root in the /etc/default/openhab file) .  I would like to be able to run as openhab, but can't until this issue is resolved.

thanks for looking at this.


Vlad Lyseyko

unread,
Jan 15, 2015, 1:15:17 PM1/15/15
to ope...@googlegroups.com
Hi Robert,
I have exactly the same problem and my approach was to make the zwave binding capable of connecting to the TCP port, you can read my post here https://groups.google.com/forum/#!topic/openhab/epGh6SCqI6w. I don't know how to program in java but I don't think it should be that difficult to add this.
Reply all
Reply to author
Forward
0 new messages