limitGenomeGenerateRAM too small for genome

davis...@gmail.com

unread,

Jun 9, 2019, 2:38:24 AM6/9/19

to pigx

Hello PigX Team!

I'm trying to run PigX in a Ubuntu 18.04 LTS HPC cluster. I tried to run pigx with the following command. PigX itself is installed in /dados/pigx/install/bin/ and I'm running this on directory of a mounted high performance disk.

/dados/pigx/install/bin/pigx-scrnaseq -s settings.yaml sample_sheet.csv

Here's my output:



Jun 09 03:22:29 ..... started STAR run
Jun 09 03:22:29 ... starting to generate Genome files
Job counts:
    count    jobs
    1    fasta_dict
    1
Job counts:
    count    jobs
    1    change_gtf_id
    1
<generator object Namedlist.items at 0x7ff1d7df83b8>
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append, as.data.frame, cbind, colMeans, colnames,
    colSums, do.call, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, lengths, Map, mapply, match,
    mget, order, paste, pmax, pmax.int, pmin, pmin.int, Position, rank,
    rbind, Reduce, rowMeans, rownames, rowSums, sapply, setdiff, sort,
    table, tapply, union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following object is masked from ‘package:base’:

    expand.grid

Loading required package: IRanges
Loading required package: GenomeInfoDb
[Sun Jun  9 03:23:47 2019]
Finished job 2.
2 of 72 steps (3%) done

EXITING because of FATAL PARAMETER ERROR: limitGenomeGenerateRAM=31000000000is too small for your genome
SOLUTION: please specify --limitGenomeGenerateRAM not less than 82829841109 and make that much RAM available 

Jun 09 03:25:37 ...... FATAL ERROR, exiting
[Sun Jun  9 03:25:40 2019]
Error in rule make_star_reference:
    jobid: 4
    output: /dados/pigx/output/Annotation/hg19/STAR_INDEX/done.txt
    log: /dados/pigx/output/Log/hg19.make_star_reference.log (check log file(s) for error message)
    shell:
        
        /dados/biologia/pigx/pigx2/STAR/source/STAR --runMode genomeGenerate --genomeDir /dados/pigx/output/Annotation/hg19/STAR_INDEX --genomeFastaFiles /dados/pigx/output/Annotation/hg19/hg19.fasta --runThreadN 8 --sjdbGTFfile /dados/pigx/output/Annotation/hg19/hg19.gtf --sjdbOverhang 99
        touch /dados/pigx/output/Annotation/hg19/STAR_INDEX/done.txt 2> /dados/pigx/output/Log/hg19.make_star_reference.log

    (exited with non-zero exit code)

^CTraceback (most recent call last):
  File "/dados/pigx/install/bin/pigx-scrnaseq", line 439, in <module>
    subprocess.run(command)
  File "/dados/miniconda3/envs/pigx_scrnaseq/lib/python3.6/subprocess.py", line 425, in run
    stdout, stderr = process.communicate(input, timeout=timeout)
  File "/dados/miniconda3/envs/pigx_scrnaseq/lib/python3.6/subprocess.py", line 855, in communicate
    self.wait()
  File "/dados/miniconda3/envs/pigx_scrnaseq/lib/python3.6/subprocess.py", line 1477, in wait
    (pid, sts) = self._try_wait(0)
  File "/dados/miniconda3/envs/pigx_scrnaseq/lib/python3.6/subprocess.py", line 1424, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt

I've searched the internet and found out that this might be due to the lack of the --limitGenomeGenerateRAM argument, which is set to a too small value by default. I've looked for the default parameters configuration on every single file of both the running directory and the installation directory. Is there any way to give this parameter to STAR? A lot of the work I've been doing depends on this, and any help would be extremely valuable.

Ricardo Wurmus

unread,

Jun 10, 2019, 4:03:23 PM6/10/19

to davis...@gmail.com, pigx

Hi,

thanks for using PiGx!

> I've searched the internet and found out that this might be due to the
> lack of the --limitGenomeGenerateRAM argument, which is set to a too
> small value by default. I've looked for the default parameters
> configuration on every single file of both the running directory and
> the installation directory. Is there any way to give this parameter to
> STAR? A lot of the work I've been doing depends on this, and any help
> would be extremely valuable.

We have a mechanism for passing custom arguments to individual tools via
the settings file (and it’s used for R, for example), but unfortunately
it doesn’t currently cover arguments to STAR.

We’ll add it and prepare a new release soon. Stay tuned!

--
Ricardo

davis...@gmail.com

unread,

Jun 14, 2019, 4:56:23 PM6/14/19

to pigx

That's great, Ricardo! I'll be waiting anxiously!

Is there any estimative on when the new release might come out?

Best regards,

Davi

Ricardo Wurmus

unread,

Jun 14, 2019, 5:18:28 PM6/14/19

to davis...@gmail.com, pigx

davis...@gmail.com writes:

> That's great, Ricardo! I'll be waiting anxiously!
>
> Is there any estimative on when the new release might come out?

I’m hopeful that it’s going to happen next week. The code is already in
place, but we need some more rigorous testing before we can release it.

If you want to give it a spin you’re welcome to fetch the code from the
master branch at https://github.com/BIMSBbioinfo/pigx_scrnaseq

If you need a tarball you can build it yourself by running “make dist”
after bootstrapping the build system and configuring the build.

If you’re using Guix (which we strongly recommend for reproducibility
reasons) I can give you a little recipe that automatically builds the
package for you using the latest source code.

--
Ricardo

Paulo Borges

unread,

Dec 14, 2020, 11:16:56 AM12/14/20

to pigx

Hi there!

Excellent work!

I wrote this message on the issues sections [GitHub] but it seems that there is much more action here! :)

I have been trying to setup RNAseq pipeline to run without using STAR for indexing and so far I was not able to figure it out.

The main reason is that for my purposes I just need to use SALMON.
Additionally, when using STAR the pipeline always falls into RAM memory issues and I have tried all the possible parameters to limit the amount used. I have 32Gb RAM and is not sufficient (Mouse as model). Also, I cannot use any other machine/cluster with higher RAM capacity.

Given this, I am wondering the following:

Is possible to disable STAR? If so, some help how to do it is highly appreciated.
By disabling STAR will this affect the outputs generated by the pipeline?

---------

After reading some threads here, I got answer for my second question. However, the first one is still open.

I read somewhere here that both SALMON and STAR are computed for reproducible results... I think it will be nice to give that option to the user (parameter), i.e. run both, or select which one to use.

Thank you.
Paulo

Alexander Blume

unread,

Dec 14, 2020, 11:59:46 AM12/14/20

to Paulo Borges, pigx

Hi Paulo,

I am not the maintainer of the RNAseq pipeline, but maybe I can still help.

You might be able to skip parts of the pipeline by specifying execution branches directly via the `targets` argument, like this:

pigx-rnaseq -s tests/settings.yaml tests/sample_sheet.csv --target={deseq_report_salmon_transcripts,deseq_report_salmon_genes}

This way the pipeline would not run any parts involving STAR.

There shouldn’t be any difference in the SALMON based output, as both branches of the pipeline are executed independently.

Best,

Alex

Alexander Blume (Gosdschan)

PhD Student

Berlin Institute for

Medical Systems Biology (BIMSB)

at Max Delbrueck Center (MDC)

Hannoversche Str. 28

10115 Berlin, Germany

Phone: +49 30 9406 1422

Email: alexand...@mdc-berlin.de

Web: https://www.mdc-berlin.de/bioinformatics

--
You received this message because you are subscribed to the Google Groups "pigx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pigx+uns...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pigx/dbf1914c-8049-4c52-b5db-324e9898fc13n%40googlegroups.com.

Paulo Borges

unread,

Dec 15, 2020, 5:37:15 AM12/15/20

to pigx

Hi Alexander!

I tried the command line you suggested: pigx-rnaseq -s settings.yaml sample_sheet.csv --target={deseq_report_salmon_transcripts, deseq_report_salmon_genes}

I got the following error: pigx-rnaseq: error: unrecognized arguments: deseq_report_salmon_genes}

However, I check the pigx_rnaseq.py file and that string is defined in the dictionary target

I also tried to run: pigx-rnaseq -s settings.yaml sample_sheet.csv --target={deseq_report_salmon_transcripts}

The following error show up:

KeyError in line 199 of /gnu/store/sncsjra5x8mr7a890gq43yqnrm3kc1rz-pigx-rnaseq-0.0.10/libexec/pigx_rnaseq/pigx_rnaseq.py:
'{deseq_report_salmon_transcripts}'
File "/gnu/store/sncsjra5x8mr7a890gq43yqnrm3kc1rz-pigx-rnaseq-0.0.10/libexec/pigx_rnaseq/pigx_rnaseq.py", line 199, in <module>
File "/gnu/store/sncsjra5x8mr7a890gq43yqnrm3kc1rz-pigx-rnaseq-0.0.10/libexec/pigx_rnaseq/pigx_rnaseq.py", line 199, in <listcomp>

Similar error occurs when running: pigx-rnaseq -s settings.yaml sample_sheet.csv --target={deseq_report_salmon_genes}

It seems to me that --target is not working properly.

1. I tried to add any of the possible keys (including the default key 'final-report') and KeyError in line 199 persists.

2. If adding more than one argument in --target then the error: unrecognized arguments is triggered for all the parameters except the first.

I went even further and comment the lines in the key 'final-report' that are related to STAR and the result was:

TokenError:
('EOF in multi-line statement', (507, 0))

Anyone has suggestions on what can I do next?

Cheers,

Paulo

Paulo Borges

unread,

Dec 15, 2020, 5:54:36 AM12/15/20

to pigx

Hi again!

I correct my edits in target['final-report']['files] and the error TokenError: ('EOF in multi-line statement', (507, 0)) was corrected.

However, STAR still runs.

The following is a copy of my edits in the targets dictionary.

targets = {
    # rule to print all rule descriptions
    'help': {
        'description': "Print all rules and their descriptions.",
        'files': []
    },
    'final-report': {
        'description': "Produce a comprehensive report. This is the default target.",
        'files':
      #[os.path.join(OUTPUT_DIR, 'star_index', "SAindex"),
            [os.path.join(OUTPUT_DIR, 'salmon_index', "sa.bin")] +
            #os.path.join(MULTIQC_DIR, 'multiqc_report.html')] +
      [os.path.join(COUNTS_DIR, "raw_counts", "counts_from_SALMON.transcripts.tsv"),
            os.path.join(COUNTS_DIR, "raw_counts", "counts_from_SALMON.genes.tsv"),
            os.path.join(COUNTS_DIR, "normalized", "TPM_counts_from_SALMON.transcripts.tsv"),
            os.path.join(COUNTS_DIR, "normalized", "TPM_counts_from_SALMON.genes.tsv"),
            #os.path.join(COUNTS_DIR, "raw_counts", "counts_from_star.tsv"),
            os.path.join(COUNTS_DIR, "normalized", "deseq_normalized_counts.tsv",
            os.path.join(COUNTS_DIR, "normalized", "deseq_size_factors.txt"))] +
      #expand(os.path.join(BIGWIG_DIR, '{sample}.forward.bigwig'), sample = SAMPLES) +
      #expand(os.path.join(BIGWIG_DIR, '{sample}.reverse.bigwig'), sample = SAMPLES) +
      #expand(os.path.join(OUTPUT_DIR, "report", '{analysis}.star.deseq.report.html'), analysis = DE_ANALYSIS_LIST.keys()) +
      expand(os.path.join(OUTPUT_DIR, "report", '{analysis}.salmon.transcripts.deseq.report.html'), analysis = DE_ANALYSIS_LIST.keys()) +
      expand(os.path.join(OUTPUT_DIR, "report", '{analysis}.salmon.genes.deseq.report.html'), analysis = DE_ANALYSIS_LIST.keys())
    },
    #'deseq_report_star': {
    #    'description': "Produce one HTML report for each analysis based on STAR results.",
    #    'files':
    #      expand(os.path.join(OUTPUT_DIR, "report", '{analysis}.star.deseq.report.html'), analysis = DE_ANALYSIS_LIST.keys())
    #},
    'deseq_report_salmon_transcripts': {
        'description': "Produce one HTML report for each analysis based on SALMON results at transcript level.",
        'files':
          expand(os.path.join(OUTPUT_DIR, "report", '{analysis}.salmon.transcripts.deseq.report.html'), analysis = DE_ANALYSIS_LIST.keys())
    },
    'deseq_report_salmon_genes': {
        'description': "Produce one HTML report for each analysis based on SALMON results at gene level.",
        'files':
          expand(os.path.join(OUTPUT_DIR, "report", '{analysis}.salmon.genes.deseq.report.html'), analysis = DE_ANALYSIS_LIST.keys())
    },
    #'star_map' : {
    #    'description': "Produce a STAR mapping results in BAM file format.",
    #    'files':
    #      expand(os.path.join(MAPPED_READS_DIR, '{sample}_Aligned.sortedByCoord.out.bam'), sample = SAMPLES)
    #},
    #'star_counts': {
    #    'description': "Get count matrix from STAR mapping results using summarizeOverlaps.",
    #    'files':
    #      [os.path.join(COUNTS_DIR, "raw_counts", "counts_from_star.tsv")]
    #},
    #'genome_coverage': {
    #    'description': "Compute genome coverage values from BAM files - save in bigwig format",
    #    'files':
    #      expand(os.path.join(BIGWIG_DIR, '{sample}.forward.bigwig'), sample = SAMPLES) +
    #      expand(os.path.join(BIGWIG_DIR, '{sample}.reverse.bigwig'), sample = SAMPLES)
    #},
    'fastqc': {
        'description': "post-mapping quality control by FASTQC.",
        'files':
          expand(os.path.join(FASTQC_DIR, '{sample}_Aligned.sortedByCoord.out_fastqc.zip'), sample = SAMPLES)
    },
    'salmon_index' : {
        'description': "Create SALMON index file.",
        'files':
          [os.path.join(OUTPUT_DIR, 'salmon_index', "sa.bin")]
    },
    'salmon_quant' : {
        'description': "Calculate read counts per transcript using SALMON.",
        'files':
          expand(os.path.join(SALMON_DIR, "{sample}", "quant.sf"), sample = SAMPLES) +
      expand(os.path.join(SALMON_DIR, "{sample}", "quant.genes.sf"), sample = SAMPLES)
    },
    'salmon_counts': {
        'description': "Get count matrix from SALMON quant.",
        'files':
          [os.path.join(COUNTS_DIR, "raw_counts", "counts_from_SALMON.transcripts.tsv"),
       os.path.join(COUNTS_DIR, "raw_counts", "counts_from_SALMON.genes.tsv"),
       os.path.join(COUNTS_DIR, "normalized", "TPM_counts_from_SALMON.transcripts.tsv"),
       os.path.join(COUNTS_DIR, "normalized", "TPM_counts_from_SALMON.genes.tsv")]
    },
    #'multiqc': {
    #    'description': "Get multiQC report based on STAR alignments and fastQC reports.",
    #    'files':
    #      [os.path.join(MULTIQC_DIR, 'multiqc_report.html')]
    #}
}

Bora Uyar

unread,

Dec 15, 2020, 6:08:17 AM12/15/20

to Paulo Borges, pigx

Hi Paulo,

The sample sheet is the only positional argument that needs to be the last argument.

The target needs to be passed before the sample sheet like this:

pigx-rnaseq -s settings.yaml --target deseq_report_salmon_transcripts sample_sheet.csv

Best,

Bora

To view this discussion on the web visit https://groups.google.com/d/msgid/pigx/6d006395-b85b-436b-bbb5-dbe86f1f9950n%40googlegroups.com.

--

_____________

Dr. Bora Uyar

Bioinformatics Scientist

Bioinformatics and Omics Data Science

Max Delbrueck Center (MDC) for Molecular Medicine

The Berlin Institute for Medical Systems Biology (BIMSB):
Hannoversche Str. 28, 10115 Berlin

web: http://bioinformatics.mdc-berlin.de/team.html#bora-uyar-phd
email: bora...@mdc-berlin.de
office tel: +49 30 9406 1545
mobile: +49 172 949 5680

Paulo Borges

unread,

Dec 15, 2020, 6:31:58 AM12/15/20

to pigx

Hi Bora,

Many thanks!

The command line you suggest returns:

TokenError:
('EOF in multi-line statement', (507, 0))

I manage to run what I wanted (everything except STAR).

I realized that I forgot to comment two more lines related to STAR (see my previous post):

os.path.join(COUNTS_DIR, "normalized", "deseq_normalized_counts.tsv",
os.path.join(COUNTS_DIR, "normalized", "deseq_size_factors.txt"))] +

Of course this is by far the safest way to do this...

PS: I am running on the datasets provided for testing.

Alexander Blume

unread,

Dec 15, 2020, 6:49:58 AM12/15/20

to Bora Uyar, Paulo Borges, pigx

Hi Paulo,

I was able to reproduce your first error and the reason why the command failed was probably the additional space that was included in your target specification in front of deseq_report_salmon_genes.

There are two ways to specify targets: single targets can be given by ```—target final_report``` and multiple targets are given like ```—target={report2,report3}```,

however for the second variant we take values as they come, so any additional whitespace breaks the call.

@Bora We should address this by removing leading and trailing whitespaces from the given target values, I'll open an issue about this.

Best,

Alex

To view this discussion on the web visit https://groups.google.com/d/msgid/pigx/CACnD4OPKo%2B0tdb-9pyJCDhi%3D050MED%3D10SK_ecKY13EgV-j2WA%40mail.gmail.com.

Reply all

Reply to author

Forward