RNAseq pipeline not running jobs: Max cluster settings files issue

15 views
Skip to first unread message

Bunina, Daria

unread,
Nov 23, 2022, 9:12:01 AM11/23/22
to pi...@googlegroups.com
Hello,

I've been trying to run the RNAseq pipeline in the past few days and don't get further than building STAR index with it. There is no error message, just the job submitting the sh run script is hanging in the cluster without any output for many hours. I noticed that at the job start the snakemake sends out about 4 jobs, but they disappear immediately after and I cannot trace them anymore. If run in the interactive mode the screen stays at the stage of submitting jobs with no other output. I attach an example of the output of such a run (error3.err). I have played with memory requirements but stopped at 150G since even 50G works for my lab member.

We did some troubleshooting in the group ourselves and the same settings file and same sh run script works absolutely fine for my group member from her profile, but not for me, even though installing pigx pipeline was done in the same way and I can run the test samples just fine with my profile (although I suspect it is running the jobs locally and not submitting to Max cluster for the test data). Our current suspicion is that it might be related to the cluster settings: when I submit the job with qsub it generates cluster_conf.json file, which has very different parameters (like the queue name and memory param syntax) than of my lab member, see both files attached. 

Even more weird: the pipeline seems to use not the fastp but trim galore for adapter trimming, even though it’s not specified in the settings file… (I saw from the log it created, see attained).

Could you please take a look and see what might be wrong and how to fix it? Here is the exact command I ran it with on Max cluster:

qsub -V -b n -l m_mem_free=100G -cwd -o output3.out -e error3.err /fast/AG_Bunina/Daria/PiGx/RNAseq/DEanalysis/runscript.sh

Thank you!

Best
Daria

————

Dr. Daria Bunina
Group Leader, MDC-Buch
Systems biology of cardiovascular and neuronal pathologies
Daria....@mdc-berlin.de

cluster_conf.json
settings_rnaseq_de.yaml
trim_galore_aged2.log
cluster_conf_notWorkingOne.json
error3.err
runscript.sh
sample_sheet_de.csv

Ricardo Wurmus

unread,
Nov 23, 2022, 1:38:46 PM11/23/22
to Bunina, Daria, pi...@googlegroups.com
Hi,

according to the error log this is pigx-rnaseq-0.0.10, which is rather
old. Could it be that you used an old version of Guix to install the
pigx-rnaseq package? Was this deliberate?

You can get a newer version of Guix (and thus of packages it can
install) with “guix pull”.

--
Ricardo

Daria.Bunina

unread,
Nov 23, 2022, 4:58:56 PM11/23/22
to Ricardo Wurmus, pi...@googlegroups.com
Hi Ricardo,

I did guix pull when installing pigx, and I also tried it again now and ran the workflow immediately after, still the same issue.

Also, running 

guix show pigx-rnaseq 

gives the version 0.1.0. But submitting the job script seems to launch an old version as you mentioned - I saw trimgalore job submitted (and disappeared) again. Do you have an idea why it pulls out some old pigx version and from where? Shall I try to remove guix completely somehow and install it again? Must be something with my profile, but I can't trace it.

Best
Daria



Bora Uyar

unread,
Nov 24, 2022, 4:06:33 AM11/24/22
to Daria.Bunina, Ricardo Wurmus, pi...@googlegroups.com
Hi Daria,
The cluster conf that doesn't work has the "queue" name wrong. It should be "all.q" rather than "all".  
Also if trim-galore is running, it means you are running an old version of pigx-rnaseq. 

Can you maybe make sure if the pigx-rnaseq that runs when you submit to cluster is the same as the pigx-rnaseq in your guix profile?

Could you first find out which executable is actually used? Can you type `which pigx-rnaseq` and then use the full path to that executable to figure out its version?
Best,
Bora












--
You received this message because you are subscribed to the Google Groups "pigx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigx+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pigx/1d97e2fc-fb9d-4aeb-abf1-1d4cbf21e27a%40SW-IT-P-EX2.mdc-berlin.net.


--

_____________
Dr. Bora Uyar
Bioinformatics Scientist
Bioinformatics and Omics Data Science
Max Delbrueck Center (MDC) for Molecular Medicine
The Berlin Institute for Medical Systems Biology (BIMSB): 
Hannoversche Str. 28, 10115 Berlin 
email: bora...@mdc-berlin.de
mobile: +49 172 949 5680

Ricardo Wurmus

unread,
Nov 24, 2022, 4:44:10 AM11/24/22
to Daria.Bunina, pi...@googlegroups.com
Hi,

Daria.Bunina <Daria....@mdc-berlin.de> writes:

> I did guix pull when installing pigx, and I also tried it again now
> and ran the workflow immediately after, still the same issue.

OK.

> Also, running
>
> guix show pigx-rnaseq
>
> gives the version 0.1.0.

Okay, this tells you that you have a decently recent version of Guix
that is in fact used. It tells you that you can get this version of
“pigx-rnaseq” if you chose to install it. It does not tell you,
however, whether this is in fact the version you have installed. More
on that later.

> Do you have an idea why it pulls out some old
> pigx version and from where?

Your script does the sensible thing of setting environment variables
according to what your Guix profile needs:

GUIX_PROFILE="/home/$USER/.guix-profile"
. "$GUIX_PROFILE/etc/profile"

Personally, I’d use “$HOME” instead of “/home/$USER”, because the home
directory might actually be located elsewhere or be mapped to a
different location. I don’t think this is the problem here, though.

I note that your script launches the rnaseq pipeline like this:

pigx rnaseq

This means that you’re using the “pigx” package, which provides the
“pigx” executable with a bunch of sub-commands for the individual
pipelines, such as “rnaseq” or “bsseq”.

Let’s see what packages you actually installed to your profile. You can
view installed packages with “guix package --list-installed” or “guix
package -I” for short. My guess is that this will show you that the
“pigx” package has been installed.

With “guix package -l” we can see how your profile has changed over
time, so we can tell *when* that package was installed.

My guess is that you have an old variant of the “pigx” package
installed. Please run “guix install pigx” (now that you are sure to
have a recent version of Guix after “guix pull”), which will install the
“pigx” package and all its dependencies with the current version of
Guix, thereby giving you a new version not just of the wrapper itself
but all of the new pipelines, too.

If you don’t actually care about any of the other pipelines, you can
also use just the rnaseq pipeline. To do that:

guix package --remove=pigx --install=pigx-rnaseq

The only change to your script would be to call “pigx-rnaseq” directly
instead of “pigx rnaseq”.

> Shall I try to remove guix
> completely somehow and install it again?

No. Guix is deterministic and (largely) free of stateful behavior, so
this would not accomplish anything. That’s by design.

--
Ricardo

Bunina, Daria

unread,
Nov 24, 2022, 6:48:42 AM11/24/22
to Ricardo Wurmus, pi...@googlegroups.com, Bora Uyar
Hi both,

Thanks for the clarifications and suggestions!

After Bora’s reply I looked further into my profile and, even though my script adds guix profile variable upon its run, I still added it to my bash profile and ran it from the shell just before submitting the script. I also went to the guix manual and ran the following lines (I skipped guix pull since I've just done it yesterday):

snakejob.translate_sample_sheet_for_report.18.sh.o5889908
sample_sheet_de.csv
colData.tsv
snakejob.translate_sample_sheet_for_report.18.sh.e5889908
PastedGraphic-5.png
PastedGraphic-4.png
PastedGraphic-6.png

Bora Uyar

unread,
Nov 24, 2022, 6:56:30 AM11/24/22
to pigx
Hi Daria,
The error is not about colData.tsv file. The reporting script is looking for a description tag that is supposed to describe the corresponding differential expression analysis. 
You add this tag in the settings.yaml file. For each differential expression analysis, you add a short description of what that analysis is used for. Later that description is printed in the html report. 

Like this:

analysis1:
    description: "comparison of aged vs young"
    case_sample_groups: "aged"
    control_sample_groups: "young"
    covariates: ''"

Best,
Bora

Ricardo Wurmus

unread,
Nov 24, 2022, 7:08:24 AM11/24/22
to Bunina, Daria, pi...@googlegroups.com, Uyar, Bora
Hi Daria,

> Ricardo to your points: the guix package -l gives me this:
>
> Generation 1 Feb 28 2022 13:43:01
> pigx 0.0.3 out /gnu/store/1h8311mr8bxd976ns5nv70sq7gizhx3i-pigx-0.0.3
>
> Generation 2 Feb 28 2022 14:08:35
> + pigx-rnaseq 0.0.10 out /gnu/store/857ggiddn2fz4nl56ml2a8c8hczyf1wp-pigx-rnaseq-0.0.10
>
> Generation 3 Nov 21 2022 16:45:36
> + pigx-rnaseq 0.1.0 out /gnu/store/37lgn81x74ddqhhsihm8vviddb9bn684-pigx-rnaseq-0.1.0
> - pigx-rnaseq 0.0.10 out /gnu/store/857ggiddn2fz4nl56ml2a8c8hczyf1wp-pigx-rnaseq-0.0.10
>
> Generation 4 Nov 24 2022 10:18:59 (current)
> + pigx-rnaseq 0.1.0 out /gnu/store/6306h9dhjk10026h0xfqsbhl8wqcn5gd-pigx-rnaseq-0.1.0
> + pigx 0.0.3 out /gnu/store/x3xmwv5dl1iqdrzm3qaf21bnd8vj80kg-pigx-0.0.3
> - pigx-rnaseq 0.1.0 out /gnu/store/37lgn81x74ddqhhsihm8vviddb9bn684-pigx-rnaseq-0.1.0
> - pigx 0.0.3 out /gnu/store/1h8311mr8bxd976ns5nv70sq7gizhx3i-pigx-0.0.3

Since you used “pigx rnaseq” (with “pigx” provided by the “pigx”
package) instead of “pigx-rnaseq” (an executable provided by the
“pigx-rnaseq” package) you did in fact end up using the “pigx” package
from Generation 1 instead of the “pigx-rnaseq” package from Generation 3.

We provide the “pigx” package as a convenient “batteries included”
collection of pipelines. It comes with a set of pipelines that are
independent of whatever other packages you may have installed, so it was
entirely unimpressed by your installation of the more recent version
0.1.0 of “pigx-rnaseq” in Generation 3. This kind of isolation from the
user’s software environment was one of the goals of the PiGx project to
make computational reproducibility the default.

In this case it can lead to confusion, because we’re also offering
packages for the individual pipelines, which have different entry points
(“pigx-rnaseq” vs “pigx” with an “rnaseq” sub-command).

> So now I think all the relevant packages are up to date, thank you
> both for your help and apologies for the trivial questions, as you
> might have noticed I haven’t worked with guix before. But maybe it
> makes sense to add more precise guidelines for guix
> installation/pull/updates and guix profile export to your pigx
> workflow documentation for people like me to enable really smooth
> installation of your software.

Yes, this is a good point. We’ll discuss this. It could well be that
the “pigx” meta-package leads to more confusion than it is useful.

Thank you for your feedback!

--
Ricardo
Reply all
Reply to author
Forward
0 new messages