TaskVine: Description of an application

sprock

unread,

Jul 7, 2023, 10:05:47 AM7/7/23

to Cooperative Computing Tools

Hello,

As promised, a description of the application (farmfarms) I'm working on.

The application runs in two stages. It is given one or more <farmname> directories, from which it creates tarballs <parent>_<farmname> .tar.gz containing the inputs for a computational experiment. Adding <parent> ensures uniqueness of the tarball name.

In the second stage the application runs a shell script (rubymeta.sh) that copies a ruby script (runmeta.rb) to the working directory and runs it. runmeta.rb copies two further ruby scripts (runner.rb, restart.rb) to the working directory along with some other scripts that do post-processing of output in preparation to loading data to an SQL database, although that functionality is currently disabled. runmeta.rb starts runner.rb, the heart of the application. runner.rb creates and sends <parent>_<farmname> .tar.gz to a slurm host, unpacks it, submits the job to slurm and waits for completion, signalled by the creation of a results file <parent>_<farmname>.tgz, which it recovers to the working directory, unpacks and (if not disabled) processes and loads into an SQL database. If the job timed out then restart.rb edits some input files, renames <parent>_<farmname> and invokes runner.rb with the 'new' <farmname>. Currently, the restart facility is disabled because it depends on accessing the database to discover the outcome of the calculation, and database submission is currently disabled while I get my taskvine application running.

Most of this is working well but at the moment I cannot get taskvine to distribute the stage 2 work to more than one worker at the same time. One worker always gets all the work, even if >1 worker is available.

The taskvine application is written in C++ and I'm running on FreeBSD 12.4 and 13.2.

Thanks for reading this far. If you have suggestions on how to debug, or want to see my debug logs, please ask.

Thanks,

Roger

Colin Thomas

unread,

Jul 7, 2023, 12:42:14 PM7/7/23

to Cooperative Computing Tools

A debug log might be helpful in seeing what's happening.

In addition you may want to take a look at one of our visualization tools; vine_plot_txn_log. The transactions log should be placed in the runinfo directory along with the debug log and performance log.

Something like this:

vine_plot_txn_log <transaction_log> --mode workers

should provide you with a graph of each worker that was connected, and a visual representation of tasks running over their lifetime.

Roger Mason

unread,

Jul 7, 2023, 2:12:12 PM7/7/23

to cctoo...@googlegroups.com

"'Colin Thomas' via Cooperative Computing Tools" <cctoo...@googlegroups.com> writes:

> A debug log might be helpful in seeing what's happening.
>
> In addition you may want to take a look at one of our visualization tools; vine_plot_txn_log. The transactions log should be placed in the runinfo directory along with the debug log
> and performance log.
>
> Something like this:
> vine_plot_txn_log <transaction_log> --mode workers
> should provide you with a graph of each worker that was connected, and a visual representation of tasks running over their lifetime.

Here is the debug log from the non-running worker.

I'll try the viz tools tomorrow.

Thanks for looking at this.

Roger

fo.dbg.gz

Reply all

Reply to author

Forward