Running apps without blocking the task engine

586 views
Skip to first unread message

Havoc Pennington

unread,
Aug 7, 2014, 12:53:45 PM8/7/14
to sbt-dev, James Roper, Peter Vlugter
Hi,

So here's another thing UIs such as Eclipse and Activator UI mode will
need to do with https://github.com/sbt/sbt-remote-control - run an
app, while still being able to interact with the build.

In fact I think this is useful on the command line as well; I know of
people who do "sbt ~run" in one terminal and "sbt ~test" in another.
Right now that is clearly unsafe, but it happens to work most of the
time, except when the compiles happen to stomp each other in target/.

There are many levels to tackle this on, from more specific to more generic.

Play has already implemented something specific to Play.
https://github.com/playframework/playframework/pull/3108/files

In this solution, "run" forks off a server and the run task finishes.
There's then a separate playStop task which stops the forked-off
server. This can be used from command line sbt.

While a good start, this has various problems as a full solution:

* it only works for Play
* UI can't tell whether "run" has finished or just forked off
* in sbt-remote-control, the "UIContext" is not expecting to last
beyond the task execution; the UIContext is what associates a task
execution ID with events. So events from the forked-off thing will
continue to arrive after UIs think the task has already ended, and UIs
won't know when those events are going to stop.

Here's a strawman solution idea for people to improve.

* tasks can spawn background jobs. It is not sbt's concern whether
these are processes or threads or what.

* the interface from sbt to a background job - an implementation is
provided by the task - would be something like:

trait BackgroundJob {
def humanReadableName: String
def awaitTermination(): Unit
def stop(): Unit
def isRunning(): Boolean
// called after stop or on spontaneous exit
def onStop(callback: () => Unit): CancelableSubscription
def tags: // don't know the type
}

So for play, it might be something like:

def humanReadableName = "Play server (dev mode, port 9999)"
def awaitTermination = // block on server to close
def stop = // close server
def tags = // tagged as "the play server"

* sbt provides "job control" similar to shell job control, so maybe
something like:

class BackgroundJobHandle(private val job: BackgroundJob) {
def spawningTask: TaskKey[_] // or Task[_]?
def id: Long = // assigned by sbt
def humanReadableName = job.humanReadableName
}
val listJobs = taskKey[Seq[BackgroundJobHandle]]("list any background jobs")
val stopJob = inputKey[Unit]("stop job by id")
val stopJobByTag = inputKey[Unit]("stop job by tag")

* tasks provide background jobs to sbt via the streams/UIContext
mechanism (i.e. sbt provides a per-task-execution API to tasks which
has some sort of job-registration feature)

* sbt or sbt-remote-control will send events to UIContext as jobs
start and stop

This would work as follows for "run":

* run may just fork off and immediately "succeed", but it will have
given sbt a BackgroundJob representing the forked-off thing

* UIs can say "ok run has created this job; I'll assume the app is
running as long as that job exists"

* UIs can offer a button to stop the background job

* background job is allowed to send events on the original task's
UIContext (OR: maybe when you create a job you have to get a new
UIContext from sbt?)

* playStop would just be an alias for "stopJobByTag playServer" or
something. Or we could have more general runStop which would be an
alias for "stopJobByTag theRunJob"

I believe the above could be implemented entirely in
sbt-remote-control / UIContext as a prototype, but the idea (as with
several things in sbt-remote-control) would be to migrate it into sbt
core so the default tasks can use it.

Maybe we could introduce a new key backgroundRun which would become
the "real" run, while the current run task would be changed to by
default do a backgroundRun and then awaitTermination on the resulting
job. This would avoid breaking current semantics of run.

Or we could do things as Play has and change the semantics of the
current run task to just fork off and then return (possibly controlled
by a separate "run mode" setting).

I *think* this proposal solves the immediate problem with "run". It
doesn't solve some other problems we've discussed in the past:

* it does not allow backgrounding the task itself; that is, a task's
result cannot be computed asynchronously. Background jobs do not have
results, though they could send events with values in them.

* it does not get involved in the process API
(http://www.scala-sbt.org/0.13/docs/Process.html ) ; a spawned process
isn't automatically a background job managed by sbt. background jobs
can be threads and not processes if they want. In fact run could still
honor "fork in run" to use either a thread or a process.

The solution could be de-generalized; i.e. instead of a general
BackgroundJob and tracking N jobs, we could just very specifically be
aware of a single RunJob. This would more or less be pulling what Play
has now down into generic sbt. I was just generalizing since it seemed
pretty easy and I can imagine wanting to background other things.

Anyway, feel free to improve this solution, or propose an entirely
different one. Just my current idea.

Thanks,
Havoc

Havoc Pennington

unread,
Aug 7, 2014, 9:07:50 PM8/7/14
to James Roper, sbt-dev, Peter Vlugter
Hi,

Thanks for follow-up!

On Thu, Aug 7, 2014 at 8:35 PM, James Roper <ja...@typesafe.com> wrote:
> Biggest difficulty with this interface is the semantics - if it's already
> stopped before the callback is registered, what happens? If the answer is
> invoke onStop immediately, how do we ensure/help tasks to safely implement
> this to avoid race conditions, is synchronisation ok or may that cause
> deadlocks? Is it ok to invoke the callback synchronously?

Yeah - I'm comfortable sorting that kind of thing out, if we like the
larger approach. We could provide an abstract base class or other
helpers to ensure it's gotten right. (If not obvious, I didn't agonize
over interface details yet.)

I was going to limit this callback thing to sbt internals if not clear
- tasks and build code aren't supposed to use this directly.

> Run may also fail - eg port already taken.

Hmm indeed.

I think here the job could send an error event and then stop itself.
Or we could introduce the idea of an "exit code" kind of thing (it
could be an Option[Throwable] if we prefer). I'm worried things might
get more complex if we allow jobs to return results rather than simply
failing or not, though.

From the command line, it's sort of unclear what to do with the errors
(or logs, for that matter) from a job. What did you do for Play, just
go ahead and dump them to stderr even though somebody might be typing
?

For a UI it seems simpler, we can just display the error someplace.

Another question is how/when this ties back to State. I'm thinking the
stopping, or failure, of a job is allowed to affect the State in
between commands; while another command is running the job status
would just be queued somewhere to be seen by the next command but not
the current one. Or something like that. This would affect visibility
of job state to tasks; UIContext would be able to see the job status
sooner by observing events, but tasks would not see the list of jobs
or status of jobs change while the task is in flight. Not sure about
this, it's a problem to work through.

We should seriously consider an alternative to jobs where we instead
make tasks backgroundable - add the idea of executing a task outside
of the main sbt loop ... I haven't fully played out what that would
look like. Josh was talking about this at one point. I'm not sure
whether it's harder or easier than this jobs idea.

> Agreed. Don't break run.
> Also note here that it's not uncommon to provide
> your own run implementation, often delegating to the original run, to do
> some setup/tear down work, eg start/stop a database. This should also still
> work - though obviously this won't work with sbt-remote-control.

Hmm. That suggests we might need some sort of hooks for setup and
teardown - either around "any job with tag xyz" or specifically around
the run job.

Ideally people could easily port their "run" override to a
"backgroundRun" override or to overriding before/after run hooks or
something.

>> The solution could be de-generalized; i.e. instead of a general
>> BackgroundJob and tracking N jobs, we could just very specifically be
>> aware of a single RunJob. This would more or less be pulling what Play
>> has now down into generic sbt. I was just generalizing since it seemed
>> pretty easy and I can imagine wanting to background other things.
>
>
> Running multiple background jobs is required, people run multiple play sbt
> sub projects at once with independent sbt invocations, if sbt remote control
> means there's only one sbt, then multiple background jobs is required.
>

I guess I meant "one per project" or "one per run task" but yeah. I
think it's about as easy to make this general so I'm not seeing the
value in making it hardcoded to run. Though I am usually a fan of
hardcoding ;-)

> One thing not addressed here is how it will deal with invoking the same
> background run task twice... is it up to the client to check current
> background tasks to not do that, or will sbt provide some mechanism to
> ensure that doesn't happen?

Good point. I threw in the "tags" mechanism to give a way to check
current jobs, and I guess I was thinking the client would have to do
it (though the stock "backgroundRun" task would do it for you for
run).

Another question is about scopes: are jobs scoped to project/task?
Maybe that actually should be used rather than tags, so a job in the
"run" scope or a job in the "run in myproject" scope would be unique.
Or possibly tags should be scoped. Or maybe there's no job scope if
the job already has a pointer to its parent task, and the task is
scoped, then you don't need a separate scope on the job. That makes
sense to me actually, you'd do uniqueness by looking for a job from
the fully-scoped task and then you could if necessary also have tags
but maybe we don't need tags, just the scoped task the job came from.

Gah this is getting complicated ;-)

Oh I see you were about to say the same -

> And what the interface doesn't make clear is
> it's possible to list running tasks,

jobs! I called them jobs so we don't get confused ;-)

> but how does the client know which sub
> project/configuration/etc that task is? If you start your IDE and Play is
> already running, how does the IDE know which sub project to associate that
> running task with, so that, for example, if it attaches a debugger, it
> associates the debugger with the right classpath/sources?

If we associate each job with the fully-scoped task key that spawned
it, I think that would probably solve it ... ?

Havoc

James Roper

unread,
Aug 7, 2014, 10:01:52 PM8/7/14
to Havoc Pennington, sbt-dev, Peter Vlugter
For Play, the support we implemented is not intended for end users - it was only written so that we could test dev mode using a scripted test, since scripted tests needed the run task to be in the background so that it could then do file modifications and assertions to check that they were handled appropriately.  So when I implemented it, I didn't think at all about how users might interact with it, I just thought about how my scripted tests might interact with it.
Probably.  There are a few requirements, an IDE is going to look at a project and say "here are some things that I want to expose to users to be able to run", and then it wants to know whether anything in that list is already running.  Then there's also the use case where it wants to interact with an existing running "job".  And then it also needs to know how to stop and restart the job.

On the topic of debugging, I'm guessing if there is an existing running task that the IDE has started, the IDE would really like to know how to attach a debugger to it.  As an end user, you could configure the task to open a remote debugging port when it forked the JVM, and then you'll have to tell the IDE which port to connect to - this is basically what we do today, but that's not what I would call a nice user experience.  I think jobs should have meta data associated with them that an IDE can query, at very least, a debug port if available.

More generally, there should probably be support at some level in SBT to select and use a debug port that doesn't involve the user configuring impossible to remember strings.  If IDEs for example are going to use SBT remote to run tests, then I would want, regardless of whether my test is forked or not, to be able to hit the debug button in my IDE, and I could debug it.  This means if it's not forked, sbt remote needs to start with remote debug enabled, and advertise the port selected for it to debug on, and if it is forked, then sbt remote needs to select a debug port for the forked process, and tell the caller what that port is.



Havoc



--
James Roper
Software Engineer

Typesafe – Build reactive apps!
Twitter: @jroper

Havoc Pennington

unread,
Aug 8, 2014, 11:17:32 AM8/8/14
to James Roper, sbt-dev, Peter Vlugter
Hi,

I realized we actually have a little written down (from a while back)
about the idea of backgrounding tasks:
https://github.com/sbt/sbt/wiki/Client-server-split#ideas-backgroundable-tasks

When that was written, we hadn't really coded any of sbt server and so
it's necessarily pretty high-level. We could be more specific now.

I think the main difference in the backgroundable tasks idea is that a
backgrounded task can have a result. So you define a single task with
a result type, but mark it as a background task. When we go to execute
a background task, we tell it whether it's being started, continued,
or stopped; but regardless of state it has to return the result type.

Some of the other mechanics might end up pretty similar to the "jobs" idea.

I think for "run" it's hard to motivate background tasks with a
result, since run returns Unit. Can we come up with any non-run use
cases for backgrounding? Maybe "test" - which also is Unit.

Currently I'm thinking of a forked-off running copy of the app as a
side effect of a task (more like sending an event or creating a file),
rather than as itself a task. The only reasons sbt tracks these "jobs"
at all:
* the UI needs to be able to list them and stop them
* the jobs need to be able to send events to the UI
But conceptually the jobs are like the filesystem, i.e. a thing
"outside" the task engine which tasks can affect.

Havoc

Havoc Pennington

unread,
Aug 8, 2014, 11:21:47 AM8/8/14
to James Roper, sbt-dev, Peter Vlugter
On Thu, Aug 7, 2014 at 9:31 PM, James Roper <ja...@typesafe.com> wrote:
> For Play, the support we implemented is not intended for end users - it was
> only written so that we could test dev mode using a scripted test, since
> scripted tests needed the run task to be in the background so that it could
> then do file modifications and assertions to check that they were handled
> appropriately. So when I implemented it, I didn't think at all about how
> users might interact with it, I just thought about how my scripted tests
> might interact with it.

Ah - key missing context ;-)

> On the topic of debugging, I'm guessing if there is an existing running task
> that the IDE has started, the IDE would really like to know how to attach a
> debugger to it. As an end user, you could configure the task to open a
> remote debugging port when it forked the JVM, and then you'll have to tell
> the IDE which port to connect to - this is basically what we do today, but
> that's not what I would call a nice user experience. I think jobs should
> have meta data associated with them that an IDE can query, at very least, a
> debug port if available.

Yep - debugging is part of the motivation for this. I think it would
simplify matters considerably for debugging and also agents if we
default to forking, and maybe allow some features to gracefully fail
if not forking. In the end, unforked run/test is just a performance
optimization, and one that makes things fragile.

But there's no real reason we can't start sbt server itself with a
debug port (other than "yet another thing to code" of course).

Requirement added, anyway: we need to be able to get the debug port
for a background job, perhaps via some sort of "attributes" mechanism.
(which sbt already has I suppose)

Havoc

Havoc Pennington

unread,
Aug 8, 2014, 12:00:14 PM8/8/14
to sbt-dev, James Roper, Peter Vlugter
One mechanical detail of this is how a background job gets a
UIContext, streams, etc. that it's supposed to use.

History
===

For those who don't know, "streams.value" is a special-case in sbt,
where unlike all other tasks, it has a unique instance for every task
*execution*. That is, if I type "foo" at the sbt console, sbt might
decide that it has to run a dependency graph of 10 tasks ending with
the "foo" task. For *every one* of the 10 dependency tasks, sbt will
create a new "streams" instance and provide it to that task. All other
dependencies are computed for the entire graph; if foo depends on
`bar.value`, and some other task in the graph does too, `bar` will be
run *once* and both will see the same value. streams.value is not run
only once, it's regenerated for every task.

What's happened historically is that anything we wanted to have these
semantics got stuck into the TaskStreams trait:
https://github.com/sbt/sbt/blob/0.13/tasks/standard/src/main/scala/sbt/std/Streams.scala#L19

For example "cacheDirectory" has nothing really to do with "log" but
we want both to be per-task-execution so they go in TaskStreams.

We've discussed breaking TaskStreams up into things like a cache
directory interface and a log interface and whatever else, and then
make UIContext another thing like this. (UIContext =
https://github.com/sbt/sbt-remote-control/blob/master/commons/ui-interface/src/main/scala/sbt/UI.scala#L5
). UIContext itself could probably be broken into Interaction and
EventSink or something.

sbt server wants UIContext to have this "per execution" semantics so
it can give each task execution a unique ID, and then tag all events
from the task with that ID.

How it affects backgrounding
===

Background jobs potentially mess this up because if a task just hands
its own streams and UIContext to the job, then the job is going to
keep those around far past sbt's intended scope for those. This
probably actively breaks things for logs (because sbt might even clean
up the log file), and it creates an undesirable result for events
(because sbt server can't distinguish events from the task from events
from the job).

If we backgrounded *tasks* instead of jobs it may well help with this,
because the backgrounded task could get its own streams and UIContext
in the usual way, by depending on them and getting new instances. sbt
and sbt server could create special versions of the logger or the
UIContext which are intended to last indefinitely. So this suggests
again the "backgroundable tasks" concept from
https://github.com/sbt/sbt/wiki/Client-server-split#ideas-backgroundable-tasks

If we don't need the background work to have a result, though, then
backgroundable tasks might be simpler. We no longer need to track the
whole lifecycle or deal with results.

One possibility is that BackgroundJob would have an anonymous
Task[Unit] in it; executing that task is supposed to launch the job
(but not block on it). So this anonymous task is just a way for the
background job to capture dependencies such as streams and UIContext,
and sbt could use special designed-to-hang-around streams and
UIContext for this anonymous task.

trait BackgroundJob {
// execute me to spawn this job
def bootstrapTask: Task[Unit]
// rest of BackgroundJob interface follows...
}

Or perhaps better, maybe the bootstrap task is what's provided to sbt,
and it returns the job (so Task[BackgroundJob]). This kind of task is
never in the build config, it's only dynamically created and handed to
sbt by another task?

trait UIContext {
// queue this task for execution; on execution, spawn background
work and return the BackgroundJob for that work
def spawn(task: Task[BackgroundJob]): Unit
// (probably that method should not be in UIContext but in another
dedicated interface)
}

A task provided to spawn() is given the special-case streams and
UIContext that are designed to persist beyond task execution and get
cleaned up on BackgroundJob exit rather than on task exit.

Not sure, will keep playing with it.

Havoc

Havoc Pennington

unread,
Aug 10, 2014, 7:34:19 PM8/10/14
to sbt-dev, James Roper, Peter Vlugter
So I spent some hours trying to understand how the streams task works now:

https://gist.github.com/havocp/a835221981458cf1cd87

(no promises I got it all right)

Based on that, there are two issues for background jobs:

* how do we auto-scope to each task UIContext (or perhaps a more
specific BackgroundJobManager interface), in the way that streams are
now
(two options I see are: genericize the `streams` special case so
any key can work the same way, or override streams and hang everything
off streams)
* currently sbt closes Streams and TaskStreams at the end of
execution, so they are going to be flat-out invalid in a background
job as best I can tell, or "best" case they will auto-reopen and then
leak. So we need some way to have logs open for a background job and
still closed when it completes. Passing streams.log to a background
job will be broken right now I think.

Havoc

Naftoli Gugenheim

unread,
Aug 11, 2014, 1:02:37 AM8/11/14
to sbt...@googlegroups.com, James Roper, Peter Vlugter
I agree that sbt could have better task management, but if anyone needs to run apps in the background today, have you seen https://github.com/spray/sbt-revolver ?

Also if the task management gets more complex, perhaps the shell could do some ncurses kind of thing, like Buck.



  



--
You received this message because you are subscribed to the Google Groups "sbt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sbt-dev+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sbt-dev/CAA3p8zBNqUM-vMWLLTDqckqWNHWZaL3zj%2Bhnbi_mX0gicf1Dug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Havoc Pennington

unread,
Aug 11, 2014, 11:16:53 AM8/11/14
to sbt-dev, James Roper, Peter Vlugter
On Mon, Aug 11, 2014 at 1:02 AM, Naftoli Gugenheim <nafto...@gmail.com> wrote:
I agree that sbt could have better task management, but if anyone needs to run apps in the background today, have you seen https://github.com/spray/sbt-revolver ?


I had seen it but good reminder. It looks like they grappled with the "streams not valid in background" issue: https://github.com/spray/sbt-revolver/blob/master/src/main/scala/spray/revolver/SysoutLogger.scala
 
I think for the sbt-remote-control future sbt should have most of this sbt-revolver functionality "out of the box"


Also if the task management gets more complex, perhaps the shell could do some ncurses kind of thing, like Buck.


Cool :-) one nice side-effect of the sbt-remote-control work is that clients are easy to write; the sbt-remote-control based terminal is a little bit toy right now, but you can see the rough scale of it: https://github.com/sbt/sbt-remote-control/blob/master/terminal/src/main/scala/com/typesafe/sbtrc/client/SimpleSbtTerminal.scala

So writing a client that embeds in your favorite editor, or writing an ncurses frontend, or anything like that should be possible without having to modify the sbt core. Though of course we want to improve the core too.

Another kind of client I'd like to have is one that's "git style" with a bunch of toplevel commands used in bash, rather than a separate shell; to make that practical might require writing a client in C so it starts up quickly. Anyway I imagine I'll never have time, but it is a nice thing that sbt server will enable in theory.

Havoc

 

Naftoli Gugenheim

unread,
Aug 11, 2014, 9:35:42 PM8/11/14
to sbt...@googlegroups.com, James Roper, Peter Vlugter
Yeah. Another possibility: an sbt shell that is just a scala REPL.

 

Havoc

 

--
You received this message because you are subscribed to the Google Groups "sbt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sbt-dev+u...@googlegroups.com.

Havoc Pennington

unread,
Sep 8, 2014, 12:01:54 PM9/8/14
to sbt-dev, James Roper, Peter Vlugter
Hi,

Here is an implementation of the background jobs concept for any comments:

https://github.com/sbt/sbt-remote-control/pull/188

The idea is to move this entire "ui-interface" plugin from a plugin to
sbt itself at some point.

Inspired by this work we are thinking of introducing a mechanism
tentatively called "services" which would cover:
- getting a Logger (or Streams)
- getting the SendEventService and InteractionService (formerly known
as UIContext)
- getting BackgroundJobService

Services are about side effects external to the task engine, such as
logging, I/O, spawning processes.

The problem with providing these things as tasks is that they need
lifecycle management (open/close) tied to things like "on project
load/unload", "around a task graph execution", or "around a background
job execution". So instead of introducing more hacks like the streams
hack, we could just make a mechanism for managed services and you can
write something like service[Logger] or logger.service or even
logger.value, don't know, in the task macro dsl.

Possibly services could or should be an implementation detail of
certain tasks rather than look different in the task macro dsl. So
then you would do something like `val logger = taskKey("the logger
service")` and then `logger := service[Logger]` and then the only
people using a "services" API would be people creating a service.

Anyway "services" is an idea that is referenced in some TODO comments
in the above PR.

Havoc


On Thu, Aug 7, 2014 at 12:53 PM, Havoc Pennington <h...@typesafe.com> wrote:
Reply all
Reply to author
Forward
0 new messages