PIGx RNA-seq HISAT2 error

Elizavet...@mdc-berlin.de

unread,

Oct 11, 2022, 5:21:34 AM10/11/22

to pi...@googlegroups.com

Dear PIGx developers,

My name is Elizaveta Kulaeva, I am currently an intern at MDC and I am using your pipeline for RNA-seq at Max Cluster. An error with the rule hisat2_index occurs when the pipeline is running; I can't understand the exact source of the error, so I write to you. I am attaching a log file as a picture to this email (looks like there is something wrong with the reference genomes, but I don't know, what exactly).

I use this reference genome, annotation and transcriptome from Ensembl:

Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz

Homo_sapiens.GRCh38.107.chr.gtf.gz

Homo_sapiens.GRCh38.cdna.all.fa.gz

Thank you in advance for your reply!

Sincerely,

Elizaveta Kulaeva, Bunina lab, MDC

hisat2_log.PNG

Bora Uyar

unread,

Oct 11, 2022, 5:26:47 AM10/11/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

Hi Elizaveta,

Can you let us know what the error message is?

I am guessing it is about insufficient memory. You can modify memory requirements in "settings.yaml" file.

Go to "execution -> rules -> hisat2-build -> memory" and set it to a higher value (say 32000). The genome might be too large to fit in the memory available on the node where the job is submitted to.

Best,

Bora

--
You received this message because you are subscribed to the Google Groups "pigx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigx+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pigx/591f541c274f48708405d2ebf4ddae33%40mdc-berlin.de.

--

_____________
Dr. Bora Uyar
Bioinformatics Scientist
Bioinformatics and Omics Data Science
Max Delbrueck Center (MDC) for Molecular Medicine
The Berlin Institute for Medical Systems Biology (BIMSB):
Hannoversche Str. 28, 10115 Berlin
email: bora...@mdc-berlin.de
mobile: +49 172 949 5680

Elizavet...@mdc-berlin.de

unread,

Oct 11, 2022, 5:50:07 AM10/11/22

to Bora Uyar, pi...@googlegroups.com

Here is the error message:

I provide 80G of memory from cluster (and rule hisat2-index also asked for resourses: mem_mb=32000)

I am also providing a modified set.yaml settings file (based on the sample from the tutorial http://bioinformatics.mdc-berlin.de/pigx_docs/pigx-rna-seq.html#preparing-the-input) and attaching it to this email. I also have a settings.yaml file in my guix profile, but I use set.yaml when running PIGx.

Which file should I modify?

P.S. I apologise in advance if my questions seem ridiculous, I am new to working with clusters, I have only worked in R before.

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 11 октября 2022 г. 11:26:35
Кому: Kulaeva, Elizaveta
Копия: pi...@googlegroups.com
Тема: [ext] Re: PIGx RNA-seq HISAT2 error

set.yaml.PNG

settings.yaml.PNG

Bora Uyar

unread,

Oct 11, 2022, 6:26:36 AM10/11/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

Hi Elizaveta,

In the set.yaml file (you attached an image of) you can add a section under "execution" that looks like this:

```

execution:

submit-to-cluster: yes

jobs: 4

nice: 19

mem_mb: 128000

rules:

hisat2-build:

threads: 2

memory: 32000

```

Maybe try decreasing the number of jobs, increase the mem_mb field and try asking for more memory for hisat2-build.

It seems the inputs are fine, the indexing starts working but gets cut off at some point. It sounds like a memory issue.

You can modify these fields to try which setting works for your genome of interest.

But before doing that, in order to rule out the possibility of other bugs, you could also try using a small portion of the genome as a test.

Can you maybe download just one of the small human chromosomes and use that to test if the pipeline works?

(e.g. http://ftp.ensembl.org/pub/release-107/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna.chromosome.21.fa.gz)

Once you make sure of this, then you can provide the full genome sequence and see how much extra memory you need to provide here.

Best,

Bora

Elizavet...@mdc-berlin.de

unread,

Oct 11, 2022, 7:17:52 AM10/11/22

to Bora Uyar, pi...@googlegroups.com

Thank you!

I checked with chromosome 21 and it works.

I also have two questions:

1) if I want to use STAR, do I need to change the line hisat2-build to star-build in the settings (except that I also need to change the mapper type)?

2) how much memory do I need to provide for the calculation that would analyse the whole genome (what number of GB do I need in the m_mem_free parameter for qsub)?

Thank you!

Elizaveta

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 11 октября 2022 г. 12:26:21

Кому: Kulaeva, Elizaveta
Копия: pi...@googlegroups.com

Тема: Re: [ext] Re: PIGx RNA-seq HISAT2 error

Bora Uyar

unread,

Oct 11, 2022, 7:49:52 AM10/11/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

Hi Elizaveta,

That's great. Now that we know the software is functioning, it is a matter of finding out the amount of resources you need for your genome of interest.

For the human genome, 32GB should be good enough, but it is also a matter of how the jobs are scheduled at the specific cluster (if multiple jobs are assigned to the same node,

they might share the memory. Maybe to simplify this problem, try to run the pipeline with a single job (hisat2_index) and upgrade the required memory if it fails.

If you want to use STAR, then you need to change the mapper type. If you wanted to change resources specific for STAR, then yes, you would need to add the configuration for

STAR in the settings file.

If you run "pigx-rnaseq --init", you can see all available options for all tools and copy paste from there to your settings file.

Best,

Bora

Elizavet...@mdc-berlin.de

unread,

Oct 11, 2022, 9:42:42 AM10/11/22

to Bora Uyar, pi...@googlegroups.com

I've run everything with these parameters, and pipelining works. Thank you for your help!

Best,
Lisa

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 11 октября 2022 г. 13:49:38

Bora Uyar

unread,

Oct 11, 2022, 1:56:47 PM10/11/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

Great! Could you maybe let us know the exact parameters you used that worked on the cluster?

Other users are also reporting similar issues with the cluster.

Elizavet...@mdc-berlin.de

unread,

Oct 12, 2022, 3:47:55 AM10/12/22

to Bora Uyar, pi...@googlegroups.com

Yes, I have attached a yaml file with the settings to this email. Important: for my work I only need raw read counts from SALMON, so I haven't debugged the process further (now there is an error at the html-reporting step - rule report2).

Best,

Lisa

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 11 октября 2022 г. 19:56:33

yaml-editor-online (2).yaml

Bora Uyar

unread,

Oct 12, 2022, 4:27:30 AM10/12/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

No I meant which resources did you use to get it running hisat2? (memory, jobs, etc)

What is the error that you get with the report2 rule?

Elizavet...@mdc-berlin.de

unread,

Oct 12, 2022, 4:38:35 AM10/12/22

to Bora Uyar, pi...@googlegroups.com

I run star aligner instead of hisat2, and it was ok with the following command:

qrsh -V -b n -l m_mem_free=50G /fast/AG_Bunina/test_rnaseq_2.sh

And with resources specified in the settings file that I sent you today.

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 12 октября 2022 г. 10:27:16
Кому: Kulaeva, Elizaveta

Elizavet...@mdc-berlin.de

unread,

Oct 12, 2022, 4:39:46 AM10/12/22

to Bora Uyar, pi...@googlegroups.com

Also the error was lilke "description error" in the report2 rule

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 12 октября 2022 г. 10:27:16
Кому: Kulaeva, Elizaveta

Bora Uyar

unread,

Oct 12, 2022, 5:26:30 AM10/12/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

Okay, so you solved the problem with hisat2 by not using it :)

It is good that we have alternative aligners then :)

Okay, thank you!

Bora Uyar

unread,

Oct 12, 2022, 5:28:02 AM10/12/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

Sorry, without seeing the actual error message, it is hard to figure out the problem.

Elizavet...@mdc-berlin.de

unread,

Oct 12, 2022, 7:37:56 AM10/12/22

to Bora Uyar, pi...@googlegroups.com

I have run the script today and for some reason the error did not appear this time (I dont know why).

I also need to run PIGx to process the ChIP-seq data. I created files set_chipseq.yaml, sample_sheet_chipseq.csv, and a script test_chipseq.sh (I attach all these files to this email), and I got an error that the file cdna.fasta is missing, but this file type (cdna) is not needed for ChIP-seq: the tutorial stated that only the genome and the gtf file is needed.

Here is the error message for the PIGx-ChIP-seq:

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 12 октября 2022 г. 11:27:48

test_chipseq.sh

sample_sheet_chipseq.csv

set_chipseq.yaml

Bora Uyar

unread,

Oct 12, 2022, 7:56:50 AM10/12/22

to Elizavet...@mdc-berlin.de, pi...@googlegroups.com

Hi Elizaveta,

Great to hear your rna-seq issue is resolved.

Could you please open a new conversation about the chip-seq issue? (as it is a different discussion and it is more likely to attract the attention of main chip-seq pipeline developers)

Best,

Bora

Elizavet...@mdc-berlin.de

unread,

Oct 12, 2022, 7:59:27 AM10/12/22

to Bora Uyar, pi...@googlegroups.com

yes, of course, thank you!

От: Bora Uyar <borauy...@gmail.com>
Отправлено: 12 октября 2022 г. 13:56:36

Reply all

Reply to author

Forward