Background jobs in Yesod

Rune Harder Bak

unread,

Oct 9, 2011, 3:49:12 AM10/9/11

to yeso...@googlegroups.com

Hi

How do people run background jobs with their Yesod apps?

Like every 4 hours: read something from the database, do some IO
action, update the database content.

I can think of
a)
Create a different executable and run it with cron every 4 hours?
This will ignore the database pool, open it's own connection and
consist of a rather large binary
as it needs the persistent layer and what-not.
Also not scale well to multiple servers.

b)
Have the action triggered be visiting a special path, and visit this
with a cron job.
We need some security here, like a never-expiring admin-user session cookie,
and somehow spawn the action in the background and log errors, to not
leave the cron hanging.
Not sure how to fork a handler action
This should perhaps also not run the action, but queue it to a worker process...
But maybe this is a bad strategy.

How do people achieve this? Is there some Yesod way of doing it?

And what about the people deploying to Heroku?

Thanks!

-Rune

Mark Wotton

unread,

Oct 9, 2011, 4:02:28 AM10/9/11

to yeso...@googlegroups.com

Anything wrong with just forkIO-ing a haskell thread and sleeping? do you need to be able to update cron frequency without restarting the yesod app?

The standard ruby way on heroku is to start up a process that does nothing but sleep, FWIW. not the most efficient method in the world, and they charge per process, but it does keep everything neatly together.

mark

--
A UNIX signature isn't a return address, it's the ASCII equivalent of a
black velvet clown painting. It's a rectangle of carets surrounding a
quote from a literary giant of weeniedom like Heinlein or Dr. Who.
-- Chris Maeda

Arash Rouhani

unread,

Oct 9, 2011, 9:17:49 AM10/9/11

to yeso...@googlegroups.com

About forking an IO, I have tried to implement something like that in one of my own projects. https://github.com/Tarrasch/DtekPortalen/blob/master/Application.hs#L83-88

Was it something like that you was looking for Rune? I've considered to factor out that general helper function but never did.

I also wondered if somebody have already have done this before before I wrote it. Or maybe you was looking for something completely else Rune?

Cheers,
Arash

Vagif Verdi

unread,

Oct 9, 2011, 12:44:19 PM10/9/11

to Yesod Web Framework

I use option B. Works fine. Regarding security, just check the ip
address and restrict it to localhost only.

Greg Weber

unread,

Oct 9, 2011, 5:12:25 PM10/9/11

to yeso...@googlegroups.com

I usually group background jobs into 2 categories- either a periodic (cron) job like this or an action that you want to perform as soon as possible, but want it detached from the request/response cycle that initiated the action. Periodic is simpler to deal with.

I don't like the b) suggestion to use a web request to perform an internal function. It creates security concerns and forces you to work through HTTP. It keeps the application monolithic instead of taking the opportunity to have a loosely coupled separate process- if your background job crashes then your entire app crashes. But it seems like some find it more convenient, which is hopefully something that these discussions and some libraries can help with.

The downside of technique a) with cron invoking a binary is having to keep reloading the binary on every execution. This can be dealt with by making the binary a daemon that will respond to a signal from cron. Obviously this is much more of an issue for a large binary that runs every minute than a small one that runs every 4 hours.

Arash shows how easy it can be to perform periodic tasks with Haskell, and this would be the cheapest approach for Heroku. There are a few concerns with a simple approach. One that could be fixed is that the task is periodic with respect to when the application is started rather than time of day. Another concern is scalability. It should be easy to start the backgrounder on a separate server, but it would also have a bloated binary.

The bloated binary issue can be alleviated by creating a separate cabal package for the persistent layer (or at least the background job). Note that this does not mean that the application must to depend on that package. You can use hs-source-dirs or symlinks so that your application can still directly compile your persistent layer even though it is used by another package.

Rune Harder Bak

unread,

Oct 10, 2011, 4:37:00 AM10/10/11

to yeso...@googlegroups.com

Thanks for all the answers, and thank for sharing the code, Arash. It
seems I was forgetting I was coding my webapp in a real language!
That is certainly the easiest approach.
Just need to find a way to stick the runDB inside forkIO.

But this still only works with one server, as I don't want N
background processes trying to do the same thing if I have N servers.
The same thing is true with Heroku I guess, which means I guess I'll
have to have another bloated binary.

> The bloated binary issue can be alleviated by creating a separate cabal
> package for the persistent layer (or at least the background job). Note that
> this does not mean that the application must to depend on that package. You
> can use hs-source-dirs or symlinks so that your application can still
> directly compile your persistent layer even though it is used by another
> package.

I'm not entirely sure what you mean here. Care to elaborate?

I haven't tried Heroku yet, but it seems like an interesting choice.
I guess I need to statically compile the binary.
Does that really work? I remember not being able to do that before,
and having to switch to a VPS (think it was libmysql problem, which
I'm not using now)

Best,
Rune

Mark Wotton

unread,

Oct 10, 2011, 6:17:07 AM10/10/11

to yeso...@googlegroups.com

On Mon, Oct 10, 2011 at 7:37 PM, Rune Harder Bak <ru...@bak.dk> wrote:

Thanks for all the answers, and thank for sharing the code, Arash. It
seems I was forgetting I was coding my webapp in a real language!
That is certainly the easiest approach.
Just need to find a way to stick the runDB inside forkIO.

But this still only works with one server, as I don't want N
background processes trying to do the same thing if I have N servers.
The same thing is true with Heroku I guess, which means I guess I'll
have to have another bloated binary.

not hard to say "run this iff i'm the right server", though.

> The bloated binary issue can be alleviated by creating a separate cabal
> package for the persistent layer (or at least the background job). Note that
> this does not mean that the application must to depend on that package. You
> can use hs-source-dirs or symlinks so that your application can still
> directly compile your persistent layer even though it is used by another
> package.

I'm not entirely sure what you mean here. Care to elaborate?

I haven't tried Heroku yet, but it seems like an interesting choice.
I guess I need to statically compile the binary.
Does that really work? I remember not being able to do that before,
and having to switch to a VPS (think it was libmysql problem, which
I'm not using now)

Yes, it works fine (although mysql is still not supported - it's all postgres).

It's actually a really good way to get a bit of extra bang for your buck - a single process is free, and you can get an awful lot done with haskell threads on the inside.

Greg wrote it up much better than me: http://www.yesodweb.com/blog/2011/07/haskell-on-heroku

I'm blackdog on #yesod, ping me if you have any trouble.

mark

Greg Weber

unread,

Oct 10, 2011, 10:07:38 AM10/10/11

to yeso...@googlegroups.com

On Mon, Oct 10, 2011 at 1:37 AM, Rune Harder Bak <ru...@bak.dk> wrote:

Thanks for all the answers, and thank for sharing the code, Arash. It
seems I was forgetting I was coding my webapp in a real language!
That is certainly the easiest approach.
Just need to find a way to stick the runDB inside forkIO.

But this still only works with one server, as I don't want N
background processes trying to do the same thing if I have N servers.
The same thing is true with Heroku I guess, which means I guess I'll
have to have another bloated binary.

> The bloated binary issue can be alleviated by creating a separate cabal
> package for the persistent layer (or at least the background job). Note that
> this does not mean that the application must to depend on that package. You
> can use hs-source-dirs or symlinks so that your application can still
> directly compile your persistent layer even though it is used by another
> package.

You can setup a dependency for 2 processes where they both depend on your Persistent layer instead of having your background jobs depend on your entire application.

App -> Persistent

BackgroundJobs -> Persistent

Both App and BackgroundJobs must be a cabal package. The persistent layer does not have to be a cabal package unless you want it to be. For the App, making Persistent a cabal package will lose automatic recompilation of the persistent layer. You can instead use symlinks or use hs-source-dirs to make sure both App and BackgroundJobs find your persistent layer.

Felipe Almeida Lessa

unread,

Oct 10, 2011, 10:14:10 AM10/10/11

to yeso...@googlegroups.com

On Mon, Oct 10, 2011 at 11:07 AM, Greg Weber <gr...@gregweber.info> wrote:
> Both App and BackgroundJobs must be a cabal package. The persistent layer
> does not have to be a cabal package unless you want it to be. For the App,
> making Persistent a cabal package will lose automatic recompilation of the
> persistent layer. You can instead use symlinks or use hs-source-dirs to make
> sure both App and BackgroundJobs find your persistent layer.

Can't you just have two executables within your .cabal file? They may
have different dependencies and different modules, however there's no
problem in having them on the same hs-source-dir. This sounds a lot
easier for me.

Cheers, =)

--
Felipe.

Greg Weber

unread,

Oct 10, 2011, 10:21:32 AM10/10/11

to yeso...@googlegroups.com

yes, that is simpler and better advice. It doesn't have the same isolation properties, one of which was related to ease of fast testing, but it is at least much simpler way to start.

Rune Harder Bak

unread,

Oct 10, 2011, 4:40:25 PM10/10/11

to yeso...@googlegroups.com

I see. Yes, I was not going to have the background process depend on
the whole application,
just the bits needed. I would be going for one cabal package with
multiple executables though.
Never got the other thing to work reliably
(haskell-mode/wai-handler-devel / having other people easily install
it / etc.).
But as long as it's only running on one server, I'll stick with the
forkIO approach.

Arash Rouhani

unread,

Jan 15, 2012, 5:25:35 PM1/15/12

to yeso...@googlegroups.com

Hello and sorry for the revival,

I've just decided to make a small project[1] of this so you can
comfortably run background jobs in any haskell program, without
dealing with any forkIO code yourself. I'm still to release this on
hackage, but I'll get to it soon.

I posted here in case anybody encounters this old thread and is
interested in solutions.

[1]: https://github.com/Tarrasch/timed-repeating

Cheers,
Arash

2011/10/10 Rune Harder Bak <ru...@bak.dk>:

Rune Harder Bak

unread,

Jan 15, 2012, 10:38:08 PM1/15/12

to yeso...@googlegroups.com

Great work, I haven't really looked in to this issue too much since
last, so I'm happy for a solution!

I intend to put the results of the action into the database, but it
should be easy to extend it with a functions
IO () -> IO ()

Another great extension could be to provide a function: memoIO :: (a
-> IO b) -> (a -> IO b)

where the new function would check if the result is already there, and
relatively recent,
if it is, use that, if not perform the action (or mark it to be
fetched on next repeated action).
It should also clean out old results, as the input type a might be infinite.

So it should probably be configurable with number of hours the result
is valid, and perhaps what to do when it's old.
an the input type should be restricted to Eq or more likely Ord to
allow for fast lookup.

Anyway, thanks for making the library!

-Rune

Arash Rouhani

unread,

Jan 16, 2012, 5:41:51 AM1/16/12

to yeso...@googlegroups.com

Thanks for your feedback Rune!

First of all, do you have any good name idea for this package, I
basically took what was on top of my head. Do you think the name should
instead have anything with caching in it?

On Mon 16 Jan 2012 04:38:08 AM CET, Rune Harder Bak wrote:
>
> Great work, I haven't really looked in to this issue too much since
> last, so I'm happy for a solution!
>
> I intend to put the results of the action into the database, but it
> should be easy to extend it with a functions
> IO () -> IO ()

Ok, so you want to provide an `IO a`, and every hour that should be run,
and that value should be passed to a user-provided `a -> IO ()`. Where
you want to provide something that writes to the db, but it could also
write to a file, sounds reasonable.

Or what did you have in mind with a `IO () -> IO ()`?

Also, do you have any opinion if we should return a `IORef a` or an `IO
a`. that is

runEveryHour :: IO a -> IO (IO (a))
or
runEveryHour :: IO a -> IO (IORef a)

Both the advantage and disadvantage is that `IO a` is more expressive,
and in the equivelent case you have something like `io = readIORef ref`.
The user isn't interested in whether my library stores the value in a
reference or not, on the other hand you probably only want a value, so
you don't have to worry that `IORef a` will launch the nuclear missiles,
meanwhile `IO a` could. But of course `IO a` is more flexible, what do
you think?

>
> Another great extension could be to provide a function: memoIO :: (a
> -> IO b) -> (a -> IO b)
>
> where the new function would check if the result is already there, and
> relatively recent,
> if it is, use that, if not perform the action (or mark it to be
> fetched on next repeated action).
> It should also clean out old results, as the input type a might be
> infinite.

Hmm, that is interesting, it's worth checking if this already exists or
not. But this seems good for other use cases, maybe it deserves it's own
package? It seems like a different thing here, as you probably are
seriously worried about your IO function crashing and want to handle it
yourself, as opposed to when you want a scraper running every hour.

>
> So it should probably be configurable with number of hours the result
> is valid, and perhaps what to do when it's old.
> an the input type should be restricted to Eq or more likely Ord to
> allow for fast lookup.

Yea, thats interesting too.

>
> Anyway, thanks for making the library!
>
> -Rune
>

Cheers,

Arash

Rune Harder Bak

unread,

Jan 16, 2012, 10:20:37 AM1/16/12

to yeso...@googlegroups.com

>> I intend to put the results of the action into the database, but it
>> should be easy to extend it with a functions
>> IO () -> IO ()
>
> Ok, so you want to provide an `IO a`, and every hour that should be run, and
> that value should be passed to a user-provided `a -> IO ()`. Where you want
> to provide something that writes to the db, but it could also write to a
> file, sounds reasonable.
>
> Or what did you have in mind with a `IO () -> IO ()`?

Yeah, something like that would happen internally in the IO () action,
but I don't see value in forcing the user to split up his action in
two.
I mean, the user might want to stream the input to a file or what do I know.
But perhaps I'm missing something?

> Also, do you have any opinion if we should return a `IORef a` or an `IO a`.
> that is
>
> runEveryHour :: IO a -> IO (IO (a))
> or
> runEveryHour :: IO a -> IO (IORef a)

I liked your earlier example that I could just treat it like a value,
but I might be better off beeing aware that it actually isn't.
I've never really touch upon the imperative-style part of Haskell, so
I can't be of much help here.
Perhaps it should be an option.

>> Another great extension could be to provide a function: memoIO :: (a
>> -> IO b) -> (a -> IO b)
>>
>> where the new function would check if the result is already there, and
>> relatively recent,
>> if it is, use that, if not perform the action (or mark it to be
>> fetched on next repeated action).
>> It should also clean out old results, as the input type a might be
>> infinite.
>
> Hmm, that is interesting, it's worth checking if this already exists or not.
> But this seems good for other use cases, maybe it deserves it's own package?
> It seems like a different thing here, as you probably are seriously worried
> about your IO function crashing and want to handle it yourself, as opposed
> to when you want a scraper running every hour.

You could also view your use-case as a special case of this more
general pattern.
But albeit a simple one, where all the potential problems doesn't really occur.
When I first thought if it I thought it must exist somewhere, but I
couldn't find any mention of
something like that anywhere. Perhaps I'm using the wrong search-terms.

Thinking of implementation one could keep a map of values and clean it up
every once in a while, but I don't really know how to keep a fixed
bound on memory.
And this all seems very low-level like there should be some clever
Haskell-lazyness trick doing it for me.
But as I said, i haven't really worked with this kind of code before.

Testing out your code right now!

-Rune

Arash Rouhani

unread,

Jan 16, 2012, 10:56:28 AM1/16/12

to yeso...@googlegroups.com

On 01/16/2012 04:20 PM, Rune Harder Bak wrote:
>>> I intend to put the results of the action into the database, but it
>>> should be easy to extend it with a functions
>>> IO () -> IO ()
>> Ok, so you want to provide an `IO a`, and every hour that should be run, and
>> that value should be passed to a user-provided `a -> IO ()`. Where you want
>> to provide something that writes to the db, but it could also write to a
>> file, sounds reasonable.
>>
>> Or what did you have in mind with a `IO () -> IO ()`?
> Yeah, something like that would happen internally in the IO () action,
> but I don't see value in forcing the user to split up his action in
> two.
> I mean, the user might want to stream the input to a file or what do I know.
> But perhaps I'm missing something?
>
>> Also, do you have any opinion if we should return a `IORef a` or an `IO a`.
>> that is
>>
>> runEveryHour :: IO a -> IO (IO (a))
>> or
>> runEveryHour :: IO a -> IO (IORef a)
> I liked your earlier example that I could just treat it like a value,
> but I might be better off beeing aware that it actually isn't.
> I've never really touch upon the imperative-style part of Haskell, so
> I can't be of much help here.
> Perhaps it should be an option.

Hmm, I'm thinking to changing the API here (after it's half-day of
publicity :p) to be `IO (IO (a))` rather than exposing IORef. One simply
has to trust the library author (me) that the inner `IO (a)` will most
likely not crash, even if the type system won't testify it.

>
>>> Another great extension could be to provide a function: memoIO :: (a
>>> -> IO b) -> (a -> IO b)
>>>
>>> where the new function would check if the result is already there, and
>>> relatively recent,
>>> if it is, use that, if not perform the action (or mark it to be
>>> fetched on next repeated action).
>>> It should also clean out old results, as the input type a might be
>>> infinite.
>> Hmm, that is interesting, it's worth checking if this already exists or not.
>> But this seems good for other use cases, maybe it deserves it's own package?
>> It seems like a different thing here, as you probably are seriously worried
>> about your IO function crashing and want to handle it yourself, as opposed
>> to when you want a scraper running every hour.
> You could also view your use-case as a special case of this more
> general pattern.
> But albeit a simple one, where all the potential problems doesn't really occur.
> When I first thought if it I thought it must exist somewhere, but I
> couldn't find any mention of
> something like that anywhere. Perhaps I'm using the wrong search-terms.

As for the `memoIO` function, I'll maybe look on it later but not for now.

>
> Thinking of implementation one could keep a map of values and clean it up
> every once in a while, but I don't really know how to keep a fixed
> bound on memory.
> And this all seems very low-level like there should be some clever
> Haskell-lazyness trick doing it for me.
> But as I said, i haven't really worked with this kind of code before.
>
> Testing out your code right now!

Cool!

Please give feedback or open any issue if you want. :)

Cheers,
Arash

Christian Brink

unread,

Jun 20, 2013, 7:46:55 PM6/20/13

to yeso...@googlegroups.com

Hi Greg,

I know this is an old thread, but I'm bumping up against very similar questions today and could use help. I wrote a StackOverflow question about it. Any guidance you can offer either here or there is much appreciated.

Thanks!

Christian

Erik de Castro Lopo

unread,

Jun 20, 2013, 8:28:11 PM6/20/13

to yeso...@googlegroups.com

Christian Brink wrote:

> Hi Greg,
>
> I know this is an old thread, but I'm bumping up against very similar
> questions today and could use help. I wrote a StackOverflow question

> <http://stackoverflow.com/questions/17220511/layout-for-separate-app-and-backgroundjobs-packages>

> about it. Any guidance you can offer either here or there is much
> appreciated.

As I stated on SO, I have a application which consists of a webapp and
a separate daemon process that collects data and inserts it into the
database that it shares with the webapp.

I basically have all the code in the same source code tree and have two
files which define main :: IO (), one called webapp.hs and the other
called daemon.hs. I then have the cabal file define and build the two
separate executables.

Unfortunately I can't share the code as its an internal project I'm
doing in my day job.

HTH,
Erik
--
----------------------------------------------------------------------
Erik de Castro Lopo
http://www.mega-nerd.com/

Greg Weber

unread,

Jun 21, 2013, 9:13:28 AM6/21/13

to Yesod Web Framework

Christian, you are kind of asking a bunch of questions at once. It seems like you have the right idea and should start implementing and ask specific questions where you actually get stuck. Erik is suggesting multiple executables in the same cabal file which may be simpler for you.

--
You received this message because you are subscribed to the Google Groups "Yesod Web Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to yesodweb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tero Laitinen

unread,

Feb 11, 2014, 9:34:03 AM2/11/14

to yeso...@googlegroups.com, ru...@bak.dk

I'm storing the queue of tasks in the database and having a separate process executing the tasks. See implementation at http://lpaste.net/99771 .

The main thread is scheduling a number of highest priority tasks and writing task ID numbers to a bounded channel. The task runner threads are reading the channel and then execute the corresponding tasks.

Tero

Ian Ross

unread,

Feb 11, 2014, 9:48:39 AM2/11/14

to yeso...@googlegroups.com

Tero's solution is more or less what I've done in another setting (although there the channel is coming from an fsnotify thing watching for file changes under a directory tree, and the worker threads run PhantomJS processes to generate page view thumbnails). For running jobs at such low frequencies though, I might just write a simple scheduler, probably based on a database table. In terms of existing solutions, there's even a package called hcron, though it doesn't seem like it's been used much.

--

You received this message because you are subscribed to the Google Groups "Yesod Web Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to yesodweb+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Ian Ross Tel: +43(0)6804451378 i...@skybluetrades.net www.skybluetrades.net

Greg Weber

unread,

Feb 11, 2014, 10:22:21 AM2/11/14

to Yesod Web Framework

Although the package I released for a database queue is mongodb specific, I don't think it would be hard to fork it and make it work for another database: http://hackage.haskell.org/package/mongodb-queue

Using the database as a queue gets a bad wrap because it doesn't scale well, but I think it is often better to start with this approach for the simple deployment and switch to something else when you do actually hit scaling issues.

Reply all

Reply to author

Forward