Memory usage and threats

59 views
Skip to first unread message

Hernan Morales

unread,
Jan 20, 2022, 8:36:24 AM1/20/22
to GetOrganelle
Hi,

I want to assemble animal mitogenomes for one bird species. In my first run I ran out of memory when using one threat and 34Gb. Is there any way to estimate the amount of memory needed? And is it best to use a single threat or multi-threat?

This is is the command I used:
get_organelle_from_reads.py -1 D2102046629_reads.1.fq.gz -2 D2102046629_reads.2.fq.gz -R 10 -k 21,45,65,85,105 -F animal_mt -o getOrganbelle_D2102046629_reads_animal_mt_out

And those fastq files are quite large - ~64Gb each which should be enough for getting around 75X for a 1.1Gb genome.
Should I also reduce the "--max-reads"?

Here is the log file,:
2022-01-19 14:18:21,759 - INFO: Pre-reading fastq ...
2022-01-19 14:18:21,759 - INFO: Estimating reads to use ... (to use all reads, set '--reduce-reads-for-coverage inf --max-reads inf')
2022-01-19 14:18:21,966 - INFO: Tasting 100000+100000 reads ...
2022-01-19 14:20:22,906 - INFO: Tasting 500000+500000 reads ...
2022-01-19 14:23:29,634 - INFO: Tasting 2500000+2500000 reads ...
2022-01-19 14:31:51,227 - INFO: Tasting 12500000+12500000 reads ...
2022-01-19 14:57:30,490 - INFO: Tasting 62500000+62500000 reads ...
2022-01-19 17:00:33,719 - INFO: Estimating reads to use finished.
2022-01-19 17:00:33,722 - INFO: Unzipping reads file: D2102046629_reads.1.fq.gz (68180469154 bytes)
2022-01-19 17:26:56,642 - INFO: Unzipping reads file: D2102046629_reads.2.fq.gz (64799842753 bytes)
2022-01-19 17:53:35,958 - INFO: Counting read qualities ...
2022-01-19 17:53:36,569 - INFO: Identified quality encoding format = Sanger
2022-01-19 17:53:36,570 - INFO: Phred offset = 33
2022-01-19 17:53:36,573 - INFO: Trimming bases with qualities (0.00%): 33..33  !
2022-01-19 17:53:36,817 - INFO: Mean error rate = 0.0057
2022-01-19 17:53:36,818 - INFO: Counting read lengths ...
2022-01-19 18:29:57,371 - INFO: Mean = 100.0 bp, maximum = 100 bp.
2022-01-19 18:29:57,372 - INFO: Reads used = 300000000+300000000
2022-01-19 18:29:57,372 - INFO: Pre-reading fastq finished.

2022-01-19 18:29:57,373 - INFO: Making seed reads ...
2022-01-19 18:29:57,467 - INFO: Seed bowtie2 index existed!
2022-01-19 18:29:57,468 - INFO: Mapping reads to seed bowtie2 index ...
2022-01-20 01:49:15,762 - INFO: Mapping finished.
2022-01-20 01:49:15,787 - INFO: Seed reads made: getOrganbelle_D2102046629_reads_animal_mt_out/seed/animal_mt.initial.fq (1244321 bytes)
2022-01-20 01:49:15,794 - INFO: Making seed reads finished.

2022-01-20 01:49:15,794 - INFO: Checking seed reads and parameters ...
2022-01-20 01:49:15,794 - INFO: The automatically-estimated parameter(s) do not ensure the best choice(s).
2022-01-20 01:49:15,795 - INFO: If the result graph is not a circular organelle genome,
2022-01-20 01:49:15,796 - INFO:   you could adjust the value(s) of '-w'/'-R' for another new run.
2022-01-20 01:49:25,941 - INFO: Pre-assembling mapped reads ...
2022-01-20 01:50:54,887 - INFO: Pre-assembling mapped reads finished.
2022-01-20 01:50:54,888 - INFO: Estimated animal_mt-hitting base-coverage = 40.13
2022-01-20 01:50:55,290 - INFO: Estimated word size(s): 61
2022-01-20 01:50:55,291 - INFO: Setting '-w 61'
2022-01-20 01:50:55,291 - INFO: Setting '--max-extending-len inf'
2022-01-20 01:50:55,404 - INFO: Checking seed reads and parameters finished.

2022-01-20 01:50:55,405 - INFO: Making read index ...
slurmstepd: error: Job 30000873 exceeded memory limit (35656317952 > 35651584000), being killed
slurmstepd: error: Exceeded job memory limit
slurmstepd: error: *** JOB 30000873 ON node920 CANCELLED AT 2022-01-20T02:58:01 ***

jj3111

unread,
Jan 20, 2022, 3:32:32 PM1/20/22
to GetOrganelle
It seems that the data has extremely low percentage of mitochondrial reads. While almost all 64 Gb reads were used, the estimated mt-coverage is only about 40x. This is very unsual. The same usually data that I heard was because the they excluded organelle reads from the raw data. I am not sure if your data is the same case.

For the memory issue, you may use "--memory-save", which should save the memory significantly in your case. But you are recommended increasing the -R, e.g. -R 30.

Increasing the threads will of course reduce the running duration, but the marginal benefit may decay as threads go high.

In summary, you can resume current job submitting:
get_organelle_from_reads.py -1 D2102046629_reads.1.fq.gz -2 D2102046629_reads.2.fq.gz -k 21,45,65,85,105 -F animal_mt -o getOrganbelle_D2102046629_reads_animal_mt_out -R 30 --continue --memory-save

Let me know your updates


Reply all
Reply to author
Forward
0 new messages