Controller write timeout

Roland Gude

unread,

Jan 17, 2012, 2:45:07 PM1/17/12

to peregrine...@googlegroups.com

Hi,

i am currently running several feasability tests with generated data.

unfortunately if the data becomes more complex than several bell curves the controller crashes with a write timeout. i can work around that issue for some time by starting the controller with nice -20 but unfortunately sooner or later i hit the wall.

the controller is running on a node which is a peregrine node as well. Is this bad practice? is there any way to recover when the controller crashes?

maybe a "distributed controller" would make sense to get rid of this failure source.

burtonator

unread,

Jan 17, 2012, 3:08:37 PM1/17/12

to peregrine...@googlegroups.com

> the controller is running on a node which is a peregrine node as well. Is this bad practice? is there any way to recover when the controller crashes?

yes... this is the issue.

The controller has to write to logs, allocate memory, needs CPU, etc

If you run the controller on a peregrine node you're just going to starve it of resources.

Generally the controller shouldn't require too much CPU so you can just run it on a thin/idle node ... but if it has to compete with something VERY intensive like a regular compute node which uses a TON of CPU / memory /disk then you will almost certainly starve it.

Also, it makes management harder... because there are like 256MB you need to run the controller and it's going to be competing or that memory with regular tasks.

So just put the controller on a dedicated node and you are set.

I have some designed for a distributed controller... I'm going to shard them and then have them write to a write ahead log. This way the controllers can crash and we can shard their work and replicate the log...

Right now if the controller crashes you have to restart the job.

Roland Gude

unread,

Jan 18, 2012, 3:19:23 AM1/18/12

to peregrine...@googlegroups.com

ok, i will retry my tests with a dedicated controller

meanwhile, could you share more info about the distributed design (like a blueprint or sth). I'd really like to hop on that boat and see what i can do.

burtonator

unread,

Jan 18, 2012, 4:03:07 PM1/18/12

to peregrine...@googlegroups.com

sth ??

Do you have an example of what you would like?

More of a flow diagram of message?

burtonator

unread,

Jan 18, 2012, 5:53:06 PM1/18/12

to peregrine...@googlegroups.com

Something like this?

http://i.imgur.com/O59Dx.png

Roland Gude

unread,

Jan 19, 2012, 10:51:53 AM1/19/12

to peregrine...@googlegroups.com

yeah something like that, though a little more verbose, would be grat

burtonator

unread,

Jan 31, 2012, 3:16:05 PM1/31/12

to peregrine...@googlegroups.com

I'll work on it... I'm trying to think of the right place to put all this information including javadoc, etc.

I've also had the flu so now back in the game :)

Also, I have Cassandra support really close now...

Roland Gude

unread,

Feb 1, 2012, 1:55:50 AM2/1/12

to peregrine...@googlegroups.com

i hope you are beeing well again.

actually i did not find the time to do anything with peregrine in the meantime as well. But at least i created a mercurial patch-queue on bitbucket to submit patches

it can be found here:

https://bitbucket.org/rjtg/peregrine-patches

burtonator

unread,

Feb 6, 2012, 6:22:17 PM2/6/12

to peregrine...@googlegroups.com

Cool!

Reply all

Reply to author

Forward