Is it possible to extract threads value outside exec ?

19 views
Skip to first unread message

mrmxs

unread,
Aug 2, 2020, 6:30:09 PM8/2/20
to bpipe-discuss
Is it possible to extract threads value outside exec ?

When i try to dump the object i get :

println "$threads" ' threads.dump()

__bpipe_lazy_resource_threads__ <java.lang.String@59174af4 value=[95, 95, 98, 112, 105, 112, 101, 95, 108, 97, 122, 121, 95, 114, 101, 115, 111, 117, 114, 99, 101, 95, 116, 104, 114, 101, 97, 100, 115, 95, 95] coder=0 hash=1494698740 hashIsZero=false>

how to get the number of threads i specified at CLI  from this?

thnx 

mr

mrmxs

unread,
Aug 2, 2020, 6:30:09 PM8/2/20
to bpipe-discuss
Hi 

my question is rather simple. I am trying to implement a load balancer into my bpipe pipelines and for that I would need to have the number of threads that is provided from CLI. But "$threads" is special,  in a way that it returns

println "$threads" + threads.dump()


"__bpipe_lazy_resource_threads__ <java.lang.String@59174af4 value=[95, 95, 98, 112, 105, 112, 101, 95, 108, 97, 122, 121, 95, 114, 101, 115, 111, 117, 114, 99, 101, 95, 116, 104, 114, 101, 97, 100, 115, 95, 95] coder=0 hash=1494698740 hashIsZero=false>"

how to  extract the number from this ?


thnx

mr

Simon Sadedin

unread,
Aug 3, 2020, 5:54:05 AM8/3/20
to bpipe-discuss on behalf of mrmxs
Hi mr,

Unfortunately it's possibly a little tricky but it depends what you exactly want to achieve.

The reason it shows up like that is because the actual threads assigned to a task are allocated "just in time" in some situations. Bpipe supports variable threads for a command, where you can declare in bpipe.config something like:

procs=3..6

This causes Bpipe to wait until it is ready to actually launch the command, and then at the last minute it will use up to 6 threads but if that cannot be satisfied it will still launch it as long as at least 3 are available. You can read a bitm ore about it at:


I was wondering if you could explain a bit more about your load balancer concept, as Bpipe might support what you want through another mechanism?

Cheers,

Simon


--
You received this message because you are subscribed to the Google Groups "bpipe-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bpipe-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bpipe-discuss/670dc0ba-2530-4231-8d04-eec0c093e4b3o%40googlegroups.com.

mrmxs

unread,
Aug 3, 2020, 11:35:17 AM8/3/20
to bpipe-discuss
First of all let me apologize for double posting... I know I did write a post two times but I thought the first one did not go through.   Well maybe there is a better way... let me describe "The Situation":

Let say I am mapping X paired-end samples:

a_R1.fq.gz
a_R2.fq.gz
b_R1.fq.gz
b_R2.fq.gz
...

first I wanna do qc so I execute

bpipe run -n 4 my.pipe


```
// in my.pipe
run{

  FastQC + ...

}
```
passing $threads (4) to:  fastqc -- threads $threads ....  (which is wrapped in FastQC  stage)

all good here fastqc will queue my submission processing 4 files at the time

now I'm out of FastQC and onto BWA (or any other aligner) what I can do here is either sort the input so that it passes through my BWA stage sequentially, using a loop to feed bwa with pairs (R1 R2) , threading the bwa in the process:


```
// in my.pipe
run{

  FastQC + BWA ...

}
```
where BWA contains something like: bwa mem -t $threads ...

OR i could sort it by `R*` and pass the input to BWA like : 

```
// in my.pipe
run{

  FastQC + %_R*.fq.gz * [BWA] ...

}

```
but now I forked my stage by X * $threads (if threads are defined at runtime)   what I would like to do is to always have my total utilized threads equally to  that defined at runtime even If I fork my pipeline. Therefore if  `%_R*.fq.gz * [BWA]` is utilized my threads definition needs to be reduced to 1 and input renamed accordingly so that if X > $threads, not all X tasks are submitted at once but X mod $threads (with threads in bwa mem set to 1),  while others are waiting in a queue. Am I making any sense?  Point is I need a way to manipulate the number of utilized threads  so that it never surpasses the number given at runtime, given that my pipeline has various thread definition set across it's processing stages  and at different levels: level one defined at runtime (bpipe run -n 4 ...), level two defined at Stage header and level three defined dynamically in course of processing (utilizing using() ). All these need to be coherent  in it execution and never surpass the one defined at runtime. 

:)


THANK YOU! :)


PS is there a better way to pass my inputs in batches without renaming the input cohort:

```
%_R*.fq.gz * [BWA]    -> Currently I am adding additional index to *.fq.gz files (%_S*.fq.gz) in order to pass 4 files instead of two to my BWA (still processing it by two so that the number of threads is not exceeded at any point)
``` 

Simon Sadedin

unread,
Aug 4, 2020, 7:15:27 AM8/4/20
to bpipe-discuss on behalf of mrmxs
Hi mr,

I might be not fully understanding - but as long as you're running it all within 1 bpipe script, bpipe should absolutely manage the threads so they never exceed the value provided to the "-n" when you run it. That should be the case even if the pipeline is "forked" into separate paths. The only situation I'd think you'd have where it doesn't do that is if you run two completely separate instances of Bpipe.

Are you observing that it isn't working like this? If so we might investigate it as a bug. Or perhaps I'm not fully understanding.

Cheers!

Simon

--
You received this message because you are subscribed to the Google Groups "bpipe-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bpipe-discus...@googlegroups.com.

mrmxs

unread,
Aug 5, 2020, 10:32:12 AM8/5/20
to bpipe-discuss
Simon hi,


I must apologize for not thoroughly reading the manual. U understood exactly what I am trying to do and u r right, it is just that i never tested it. i took it for granted that if it forks my tasks, it does it as a function of input size , but if i limit my pipe with  -n,  it scales it accordingly. bpipe supersizes me every time i use it. If you are the Simon (as in: the first author of the 2012 paper and repo's maintainer) I thank you for the tool and u'r co-authors!  the tool is streamlining my job immensely. Before I was just playing with it but now I am in a position to potentially make it a standard across several groups I am currently providing bioinfo support for.


cheers!

mr


sorry for typos---


On Tuesday, August 4, 2020 at 1:15:27 PM UTC+2, Simon wrote:
Hi mr,

I might be not fully understanding - but as long as you're running it all within 1 bpipe script, bpipe should absolutely manage the threads so they never exceed the value provided to the "-n" when you run it. That should be the case even if the pipeline is "forked" into separate paths. The only situation I'd think you'd have where it doesn't do that is if you run two completely separate instances of Bpipe.

Are you observing that it isn't working like this? If so we might investigate it as a bug. Or perhaps I'm not fully understanding.

Cheers!

Simon

To unsubscribe from this group and stop receiving emails from it, send an email to bpipe-...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages