Actionhero Lagging ! Need assistance!

79 views
Skip to first unread message

D Doinov

unread,
Feb 11, 2015, 4:00:39 AM2/11/15
to action...@googlegroups.com
Hello, 
I'am developing a service with actionhero and I ran into a big problem. The whose service isnt always responding when its under a big load.
A little explanation first : 

The service basicaly serves 1 file (10kb) and then that script manages access to the service by calling an endpoint that returns a JSON thats not more a few kb of size. 

Now at first the problem was the MYSQL, I fixed that with a result cache so the DB server doesnt stall the whole proccess. Everything is cached on an actual dedicated redis server. 

So far so good. 

But the whole thing is still failing to respond when it gets a few thousand requests.  Which is not supposed to happen. 
What worries me even more is that when access the /status endpoint it sais 30K+ active connections ! Which is not supposed be possibble.
A request for content takes between 5 and 30 ms depending on whether the content was cached  and 99% of the time it is cached. 

I dont get where the problem is comming from. 

Does anyone have some suggestions for me. I need to get this fixed and i'm out of ideas. 

Its just annoying that an entire server cannot proccess a simple SELECT statement when it needs to or even return a cached result. And how could so many connections be open when there are no more then 600-700 request being made in a few seconds? 


Bryan Tong

unread,
Feb 11, 2015, 4:04:51 AM2/11/15
to D Doinov, action...@googlegroups.com
Sounds like you have a bunch of clients doing keepalive. Which would explain all the connections being open.

And from a preliminary standpoint without seeing any code it sounds like you have something blocking the event loop. Just out of curiosity are you running a cluster? How many workers do you have?

Have you considered putting a semi caching proxy in front of your node implementation?


--
You received this message because you are subscribed to the Google Groups "actionHero.js" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionhero-j...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
eSited LLC

D Doinov

unread,
Feb 11, 2015, 4:24:52 AM2/11/15
to action...@googlegroups.com, ddoi...@gmail.com
First I'm not that familiar with Node and Actionhero.

But from what I've read and the results that i'm getting there shouldnt be anything blocking the event loop. After all from what i understand about the workings f the whole project if the loop is blocked then i just dont get anything from the service. I've encountered that problem before :) . But if i'm getting the result doesnt that mean that the loop has ended and the connection should be closed? 

And as far as the code goes. The piece of code that accesses the service is a prety standart AJAX request : 
if (win.XMLHttpRequest) {// code for IE7+, Firefox, Chrome, Opera, Safari
Obj=new XMLHttpRequest();
} else {// code for IE6, IE5
Obj=new ActiveXObject("Microsoft.XMLHTTP");
}
Obj.open("http://somehting....",true);
Obj.send();

What I do have in the project is alot of functions (for managing diferent actions) that do not use loop style syntax : function(params,success(),error()).
Some of them just record something and thats it. Some of them parse some value and return it. 

Could that be an issue ? 

Bryan Tong

unread,
Feb 11, 2015, 5:15:16 AM2/11/15
to D Doinov, action...@googlegroups.com
I understand.

Are you running a single worker process?

Check your CPU usage when its lagging are you using all of one core?

--
You received this message because you are subscribed to the Google Groups "actionHero.js" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionhero-j...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

D Doinov

unread,
Feb 11, 2015, 6:11:51 AM2/11/15
to action...@googlegroups.com, ddoi...@gmail.com
The server has 8 CPU's . The statistics when its lagging sais 500%  or 50% of total. Something like that. And i'm not sure how to check how many worker procceses  i have. 

Bryan Tong

unread,
Feb 11, 2015, 6:13:15 AM2/11/15
to D Doinov, action...@googlegroups.com
If you are on bash try something like.

ps aux | grep node

That should bring up any process you have running

On Wed, Feb 11, 2015 at 4:11 AM, D Doinov <ddoi...@gmail.com> wrote:
The server has 8 CPU's . The statistics when its lagging sais 500%  or 50% of total. Something like that. And i'm not sure how to check how many worker procceses  i have. 

--
You received this message because you are subscribed to the Google Groups "actionHero.js" group.
To unsubscribe from this group and stop receiving emails from it, send an email to actionhero-j...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

D Doinov

unread,
Feb 11, 2015, 6:29:19 AM2/11/15
to action...@googlegroups.com, ddoi...@gmail.com
ps aux | grep node -  shows 2 procceses from 2 separate directories. Which is how its supposed to be. 

top - comand shows one "node" proccess

But its not lagging right now 

D Doinov

unread,
Feb 11, 2015, 6:36:07 AM2/11/15
to action...@googlegroups.com, ddoi...@gmail.com
When its lagging the usual output of the top command is aither half of CPU's on MYSQL or half of CPU's on node. No matter what i trie to do it taxes the server very heavily. I cant seem to find out where the problem is. Those requess are supposed to be a couple miliseconds each. instead they take forever. 

D Doinov

unread,
Feb 11, 2015, 6:44:54 AM2/11/15
to action...@googlegroups.com, ddoi...@gmail.com
Is there a configuration or a API method that i can call to force the connection closed when the responce s sent ?

Evan Tahler

unread,
Feb 11, 2015, 12:25:21 PM2/11/15
to D Doinov, action...@googlegroups.com
There are a number of ways that you can end up with a ton of active connections:

- slow actions (is your DB pegged?). Keep in mind that the action will still keep "running" even if the client times out, navigates away from the page, etc.  Mysql also tends to NOT scale linearly.  When you have these bottlenecks, check `show full process list` on mysql to see how many queries are active at once.  From what you describe, mySQL might be having the issue...
- double-callbacks.  It's possible to craft a bad action that calls back twice.  This is bad and can hurt you 
- multi requests from the client.  Is these a JS bug on the client?  You might be calling the API more than once.  Keep-alive may be the problem here (as bryan points out)

Without seeing your application, I would suggest that you add a *ton* of logging so you can start narrowing down with of these is the culprit.  
On Wed, Feb 11, 2015 at 3:44 AM, D Doinov <ddoi...@gmail.com> wrote:
Is there a configuration or a API method that i can call to force the connection closed when the responce s sent ?

--

Chad Robinson

unread,
Feb 18, 2015, 11:10:48 PM2/18/15
to action...@googlegroups.com, ddoi...@gmail.com
This may or may not be related, but this seems like a good thread to share an experience. I'm a huge advocate of build-process tools, and we use Jenkins plus Istanbul (and Mocha) to do code coverage. We use a command in our Jenkins build like the following (including here because it took us a while to sort out - maybe it will help somebody else):

JUNIT_REPORT_STACK=1 JUNIT_REPORT_PATH=report.xml NODE_ENV=test node ./node_modules/.bin/istanbul cover -- ./node_modules/.bin/_mocha -R mocha-jenkins-reporter
node
./node_modules/.bin/istanbul report cobertura

The ugly command lines are because Windows doesn't like executing NodeJS scripts directly - putting "node" before each command works better on those machines.

This produces a nice report that Jenkins can import as a build artifact, and show history over time. We've been using this to drive our API test coverage as high as possible (and also as a defensive shield against new requests from Product when we feel overall quality is dropping - a chart that proves it really helps make the case that some refactoring and "love" is necessary!)

Anyway, there was a branch in our code that we hadn't tested, where we were trapping exceptions from SequelizeJS. We hadn't tested it because nothing had ever failed - so we wrote a hack that uses the beforeCreate/beforeFind/etc hooks in SequelizeJS to pretend the database had failed. And everything started hanging up in almost exactly the way you described - no trouble at low rates, lots of trouble during load testing.

It turns out that when we were processing the exception, we weren't reporting the error to ActionHero properly. ActionHero is a lot like Express in being callback-driven, and you must make ABSOLUTELY SURE that:
  1. The callback is ALWAYS called no matter how the code works out, and
  2. The callback is ONLY called a single time.
It's literally the first thing we check for now. Calling next() more than once, not calling it at all, or calling it with something unexpected (the second parameter = false is NOT for reporting errors - grin) is a big problem.

Code coverage tests really saved our bacon here because this is one of those things developers rarely test for and QA never really "sees". It's hard to simulate a total database failure (not an error, a complete loss) so we often write it off and hope it doesn't happen much - then 6 months later when it does we shrug our shoulders and write it off as a freak issue, even though we could have caught it. In this case, by leaving "hanging" connections, it could have killed our production environment by swamping it with zombie sockets that never got closed. Classic cascade failure, because as nodes die under load, it would shift to others... which would have the same bug!

I highly recommend you check the items above, but I also highly recommend a code coverage tool. It's simple enough to run and can be a huge debugging aid for resolving problems like this.
Reply all
Reply to author
Forward
0 new messages