Is restarting the gearmand the only way to get it to release that memory?
On Linux, yes. This is how the Linux memory allocation system works.
--
Brian.
--------
http://brian.moonspot.net/
Err not necessarily. It's how the application is designed; it uses
pools of memory and never free()'s them back to the OS later.
The perl daemon does this too, but due to how perl works with memory
allocation.
Just calling free() doesn't mean the OS gets the memory back either.
It just means that the hunk of memory just released is available for
the next call to malloc().
To free back to the OS, all the "free" blocks controlled by malloc
need to be in a contiguous block at the end of the memory space, and
the OS has to provide a system call to return the memory to it.
No idea what the deal is with the C guys.
The C daemon doesn't. It's been claimed that it can't.
On Nov 22, 2010, at 4:08 PM, Tim Uckun wrote:
>> No idea what the deal is with the C guys.
>
>
> The C daemon doesn't. It's been claimed that it can't.
John Pettitt
Email: j...@p.tt
From what I understand, the issue is that gearmand allows unlimited
additions to the queue, not that it won't return memory to the system.
Because really, if it can't *keep* the RAM, then you can't let it
allocate it all even at peak times and have a stable system, even if it
does return it to the system.
So I'd say the feature request is for a limit on memory usage, which
would be pretty easy to implement, though it may require you to make
sure your code can handle errors when trying to submit jobs.
I'd also say that if you have a situation where you can't restart
gearmand's at will, you may not be taking advantage of its fault
tolerance. With a persistent queue your background jobs won't be lost,
and if you *can* lose jobs, then whats the resistance to an occasional
restart anyway?
One option in C++ (this might also work in C?) is to redo the malloc() code, and replace it with TCMalloc from http://code.google.com/p/google-perftools/ so that mmap is used. You can actually allocate memory that's later given back to the OS. Essentially you tune TCMalloc to only use mmap(), and use a delay before releasing memory to the OS, so the number of OS calls for applications using malloc() are reduced. Doing this rework should result in lower overall memory usage, and should have no meaningful change in the CPU utilization for using mmap instead of brk().
I do know that Eric Day had a C++ port of gearman going, so it's possible that you could try it in that fork to see if it solves the problem.
Another option would be to redo the queue storage so that nothing is held in memory at all, just save the job list to a persistent (on disk) store and access it from there. Then memory usage could remain basically constant regardless of the size of your queue. Of course that would come with it's own performance penalty, but could allow operation in use cases where memory is scarce.
Adrian
Personally I think this makes the most sense because it would give you
the option of manually putting things into the queue if you were using
a database maybe that's not a good idea but it would be an option.
If all you need is a memory allocation limit, then you can do that now
with ulimit (bash builtin, called something else on tcsh). Set an
arbitrary limit on virtual memory and malloc will fail once you reach
that limit. Of course, gearmand might not like that... You'll have to
test it.
> I'd also say that if you have a situation where you can't restart
> gearmand's at will, you may not be taking advantage of its fault
> tolerance. With a persistent queue your background jobs won't be lost,
> and if you *can* lose jobs, then whats the resistance to an occasional
> restart anyway?
Sure, gearman is great with redundancy, but I don't want to frequently
enter a situation where I depend on it. Failure should be an edge
case. You don't want people thinking that gearman is unstable during
regular operation.
By the way, what happens to a worker which is already working on a job
when the job server goes down? Doesn't the worker need to report
success to the queue so that the job is removed? Since servers cannot
share the same storage backend, the worker cannot report success back
to a different server...
That's one of the shortcomings of the gearman protocol. There is no
server to server replication of jobs. Any jobs queued on the failed
server will never be submitted unless the server on that server comes
back up. If the server is dead for real those jobs are lost forever.
If you used a central database as a persistance engine you can fire up
a gearman server and connect it to the same table and then connect all
the workers to it but that's a lot of work and gets you back to the
single point of failure which is the database.
"Matt Harrington" <quer...@gmail.com> wrote:
>On Nov 22, 2010, at 16:51, Clint Byrum <cl...@fewbar.com> wrote:
>> So I'd say the feature request is for a limit on memory usage, which
>> would be pretty easy to implement, though it may require you to make
>> sure your code can handle errors when trying to submit jobs.
>
>If all you need is a memory allocation limit, then you can do that now
>
>with ulimit (bash builtin, called something else on tcsh). Set an
>arbitrary limit on virtual memory and malloc will fail once you reach
>that limit. Of course, gearmand might not like that... You'll have to
>test it.
>
Great idea. I think gearmand actually handles that situation well. It may even be worth adding a test suite to make sure this is the case.
>> I'd also say that if you have a situation where you can't restart
>> gearmand's at will, you may not be taking advantage of its fault
>> tolerance. With a persistent queue your background jobs won't be
>lost,
>> and if you *can* lose jobs, then whats the resistance to an
>occasional
>> restart anyway?
>
>Sure, gearman is great with redundancy, but I don't want to frequently
>
>enter a situation where I depend on it.
Why not? Fault tolerance that never undergoes the occasional fail test is hard to trust.
Failure should be an edge
>case. You don't want people thinking that gearman is unstable during
>regular operation.
>
Quite right. Even more important though is making sure people trust gearman's natural fault tolerance.
>By the way, what happens to a worker which is already working on a job
>
>when the job server goes down? Doesn't the worker need to report
>success to the queue so that the job is removed? Since servers cannot
>share the same storage backend, the worker cannot report success back
>to a different server...
Workers will continue processing jobs from servers that are up. If you have a fault tolerant persistent queue, one just needs to start a new gearmand with the same backend and the job will be redone eventually. If you can't have jobs done twice, gearman offers little help. Distributed transactions are a tough thing to get right, where making jobs idempotent usually isn't.
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> If you used a central database as a persistance engine you can fire up
> a gearman server and connect it to the same table and then connect all
> the workers to it but that's a lot of work and gets you back to the
> single point of failure which is the database.
I would gladly give up some performance to allow multiple gearmand
instances the ability to share one backend database table.
That's possible now. You can have every instance use a different
table so there won't be conflicts.
Sure, and that is what I do now but it isnt the same as sharing state.
If I lose a server and there were jobs in it, those jobs are lost
until I roll out of bed and do something about it. I guess I would
also lose/repeat the jobs in progress when the server does go down.
Sharing state would be slick but would obviously require some core
changes and a drop in performance. That and a redundant backend store
(on my todo list to replicate the db) sure would be a nice all-around
package for my needs.
I used heartbeat before to manage gearmand+libtokyocabinet+drbd. There
would have been no "rolling out of bed". If gearmand died, the other box
would take the IP, mount the drbd volume, and start gearmand, gaining
back all of the unfinished jobs. The workers would automatically
reconnect and keep using it as if nothing but a few transient errors had
happened.
This was non-obvious with workers written using the pecl module.. I had
to add some error handling in the code to teardown/recreate the worker
objects.. but that was mostly because of bugs in libgearman which may
have gone away.
> Sharing state would be slick but would obviously require some core
> changes and a drop in performance. That and a redundant backend store
> (on my todo list to replicate the db) sure would be a nice all-around
> package for my needs.
The state is shared via the backend. Since the only time this matters is
when *gearmand dies*, its sort of out of gearmand's ability to fix this
problem.
I do think it would be useful to have an admin command that says "read
everything out of the persistence store" so you don't have to start a
new gearmand to get the queues into another gearmand.
Thats a dangerous game your playing for not much gain, but I see your
point.
> >
> > So I'd say the feature request is for a limit onmemoryusage, which
> > would be pretty easy to implement, though it may require you to make
> > sure your code can handle errors when trying to submit jobs.
> >
> > I'd also say that if you have a situation where you can't restart
> > gearmand's at will, you may not be taking advantage of its fault
> > tolerance. With a persistent queue your background jobs won't be lost,
> > and if you *can* lose jobs, then whats the resistance to an occasional
> > restart anyway?
>
> We actually aren't taking advantage of gearmand's fault tolerance in
> most cases because it's not suitable for our uses. We're dealing with
> high volume web traffic, some of which requires asynchronous work to
> be done. In the interest of freeing up apache as quickly as possible
> to service another request, we fire off a background job on localhost.
> With failover, we'd have the potential for a one second timeout from
> *each* apache request plus the additional round trip to the next
> server in the list.
This actually doesn't have to be true.
You can use a process monitor for gearmand that will tell you *the
instant* it dies that it is dead. Upstart (my favorite.. being a
canonical employee and ubuntu server team member ;) would look something
like this (say, /etc/init/gearmand.conf
start on (net-device-up IFACE=lo)
stop on runlevel [016]
respawn
post-start script
touch /var/run/use-local-gearmand
end script
exec /usr/bin/gearmand
pre-stop script
rm /var/run/use-local-gearmand
end script
This will have several effects:
1) gearmand will be started again instantly if it dies, so the window
for timeouts is VERY small
2) you can change your code to check for /var/run/use-local-gearmand and
when its not there, use the pool of other servers. On a crappy lan
you're still only talking about 10ms of latency.
3) Can't handle the latency? Just write a log file of "pending gearmand
submits" and have a cron job that shoves these into gearmand every 5
minutes.
> AFAIK the gearman php extension still doesn't do
> any load balancing between gearman servers but just moves to the next
> one on the list, so now suddenly the next server on the list will be
> slammed with twice as many jobs as it usually gets, possibly causing
> slowdowns on the other server too.
Such is the nature of fault tolerance and N+1 architectures. Whats the
point of having lots of servers if you can't take one down here and
there for maintenance?
> It wouldn't take long for requests
> to pile up and cause a backlog of traffic the web server can no longer
> handle. So we prefer to operate under the assumption that it is not OK
> to restart gearmand when a server is in production. So right now if
> gearmand is using more memory than we'd like, we wait for off peak
> time, take the server out of production and then restart gearmand. It
> works, but it's also enough of a pain that our CIO wants to look into
> alternatives.
>
So really, in this scenario, gearmand is like a giant muddy pond between
your stream of web requests and your stream of workers. As long as the
flow out, is the same as the flow in, the pond does not expand. But as
soon as the flow in exceeds the flow out, the pond must expand, causing
damage as it floods the area around it.
There are a few ways to handle this:
1) levees around the pond (ulimit). This moves the flood upstream (to
the web requests).
2) dam upstream (shut down/throttle web requests) This moves the flood
to other streams.
3) expand downstream (more workers) This creates a massive downstream
that sits empty most of the time (see: Los Angeles River for an example)
4) dig a canal between the ponds.
This last idea would be fairly simple. Write a worker that is only
activated when queue lengths have grown too long on one gearmand. The
worker does something like this:
$worker = GearmanWorker();
$worker->addServer('flooded-pond');
$client = GearmanClient();
$client->addServer('dry-pond');
while (true)
{
$worker->work();
// obtain list of functions that are flooded
$flooded_functions = get_list_with_status_command();
if (count($flooded_functions)) {
foreach $flooded_functions as $func {
$worker->addFunction($func, 'canal', $client);
}
// drain for a while
for ($i = 0; $i < 1000;++$i)
$worker->work();
} else {
$worker->removeAll();
sleep(30);
}
}
function canal($job, $client) {
$client->doBackground($job->functionName(), $job->workload());
$job->sendComplete();
}
You could even get really clever and distribute the jobs evenly amongst
the other gearmands.
>
>> From what I understand, the issue is that gearmand allows unlimited
>> additions to the queue, not that it won't returnmemoryto the system.
>>
>> Because really, if it can't *keep* the RAM, then you can't let it
>> allocate it all even at peak times and have a stable system, even if it
>> does return it to the system.
>
> Our case might be a bit special, but for us it *is* that it can't keep
> the ram. We have no problem with it temporarily eating up a ton of ram
> if jobs are piling up because of a busted worker or whatever, but when
> that memory is no longer needed it would be ideal if it could be
> returned to the system, where it might be better served as system
> cache or any other thing that might actually be able to make use of
> that memory...
>
>>
That's gonna bite you in the ass at some point - specifically there is a relatively high probability that the time gearman's queue expands due to a backed failure or overload will be exactly the time you need as much resource on the other processes on that box. You are just asking for a cascade failure. In the end the fact that gearman is a memory queue system (that may have a database backup) makes it prone to a running out of resource if the workers die or go off line. If you can't live with that you should probably pick another queue manager.
We're using gearman for a couple of things - one is to distribute synchronous compute intensive load (image scaling) to workers so that we don't get cascade failures and can run worker machines up to but not beyond the optimal load and the other is to dispatch async logging / stats events so we can free up web servers and not have a lot of machines contending for writes to log tables. In doing so we designed for a specific permitted backlog and ulimited gearman accordingly - we know when it will fail and how it will fail. Memory is cheap, knowing how our systems will behave under failure conditions is priceless.
That said if I had to limit gearman to a low memory pool I'd probably write a job that monitored the gearman backlog and if it got too high pulled jobs and spooled them to a flat disk file as fast as possible to be fed back to the queue later when the storm is over. In other words a flood control reservoir that soaks up surges to borrow the analogy of another poster in this thread.
John Pettitt
Email: j...@p.tt
Some people said speed but my informal testing shows that even with no
persistance backend gearman is already at least one order of magnitude
slower than rabbitmq. Of the persistance engines mysql is the fastest
and is only marginally slower. SQLite and postgres are dog slow.
Having said that we are still talking about microseconds and your
workers are going to be taking way longer than that to do the jobs so
maybe the speed of job distribution is not that important. In fact by
choosing gearman we have made the decision to go with the one of the
slowest queue managers because it gives us other functionality.
Why not remove the in memory queue. Make the sqlite, or berkleydb or
something similar the default persistence engine and use it for the
queue. That way gearman is safe and slow by default. It already uses
memcache as a backend so if people wanted to make it run in memory
they could, alternatively you could have it use the memory option with
a flag.
BDB already has replication so theoretically you could leverage that
to get better failover too.
You probably don't need to remove the in memory queue. Instead, change
the default. I havnt used either in gearman, but I assume gearman
initialized the database if one does not yet exist?
Just make sure that the port number is in either the default path/file
name to the db or specified at some other level (default table name?)
It's hard to justify, but I find myself running multiple instances on
one machine to follow business requirement of data isolation between
customers.
That said, gearmand still keeps all jobs in memory. The backend store
as currently implemented is only used for persistance and won't
resolve the original issue stated in this thread.
if (server_job->options.allocated)
{
if (server_job->server->free_job_count < GEARMAN_MAX_FREE_SERVER_JOB)
GEARMAN_LIST_ADD(server_job->server->free_job, server_job,)
else
free(server_job);
}
-John
John
John Pettitt
Email: j...@p.tt
Each plugin has an initialization callback that can, and usually does,
create the db/tables/etc if they don't exist.
Currently they all do some kind of sync to disk every time a job is
added or deleted. This causes dramatic slowdowns. In the past I measured
the difference on good hardware. With just in-memory, I was able to
achieve 100x the speed inserting/deleting jobs at a constant rate with
just RAM vs. sqlite, and about 90x the speed vs. TokyoCabinet. Even with
battery backed write cache, the extra overhead of writing to the
filesystem and dealing with the IO controller made it unbearably slow if
there was a high churn.
So, I don't think this is an appropriate default. Gearman has never
guaranteed persistence and shouldn't start now IMO.
What might be interesting is to implement append-only log based recovery
storage. This is, I believe, similar to what qpid does. The idea would
be to hold several log files open (each one on a different SCSI target)
and use AIO to append to each log file, just selecting a log file based
on whether or not it is available for writes. Deletes would simply
append a "that one is deleted" log entry. The daemon would clean up log
files periodically by keeping track of how many "live" entries are in
each one, and rotating to new log files on a regular basis. Recovery is
also really fast because you just stream the log files and add/delete
entries from RAM until you have consumed all of the log files, then
delete them and start gearmand.
I made a prototype of this at one point (without the AIO), I should dig
it out and see if it still works. The other cool part about this is of
course you could point another gearmand at these log files and it could
do replication by reading them, reducing failover time.
> Just make sure that the port number is in either the default path/file
> name to the db or specified at some other level (default table name?)
> It's hard to justify, but I find myself running multiple instances on
> one machine to follow business requirement of data isolation between
> customers.
>
> That said, gearmand still keeps all jobs in memory. The backend store
> as currently implemented is only used for persistance and won't
> resolve the original issue stated in this thread.
>
It wouldn't be all that hard to change the job add code to say "if queue
length is > X, create a disk overflow file, then say "if there is a disk
overflow file, write jobs to disk instead of RAM" and then read from
disk overflow once the queue gets back to 0.
But it will probably drag horribly, and the queue will get backed up
even more while this is happening. I'd say its better (and cheaper) to
add more workers and keep the queue length down, than to try and build a
gearmand box that has the IO muscle needed to make this not suck. Thats
sort of the whole point of gearmand.. to decouple frontends from
backends so you can scale them independently.
@clint what you described about thr AOF its pretty interesting but in
my opinion we are adding a lot of overhead to gearman with its fast
just because its simple. Why just don't create an storage plugin for
redis. Redis has all this features you just mention, its incredible
fast and doing in on top of it will allow us to forgot about the mess
of big changes on gearman's code.
Regards
--
Sent from my mobile device
Gabriel Sosa
Si buscas resultados distintos, no hagas siempre lo mismo. - Einstein
What exactly is the advantage of keeping the queue in the memory.Some people said speed but my informal testing shows that even with no
persistance backend gearman is already at least one order of magnitude
slower than rabbitmq.