Meganizer

0 views

Skip to first unread message

Marianna

unread,

Aug 3, 2024, 6:08:23 PM8/3/24

to toiloasildu

The main reason for this to happen is when daa-meganizer is run with too little memory. Edit the file MEGAN/megan.vmoptions to set the amount of memory that MEGAN (and all MEGAN tools) can use. It should be 16GB for daa-meganizer to run comfortably.

Thanks Daniel: I am trying again with Megan v6.21.2, and 500GB memory, but same outcome. There are active Java processes, using up to 500GB. But runs never get beyond 90% after 8 days on the HPC. Failed input files have about 530 million reads, but some other files with similar numbers have succeeded with less than 300GB. So the outcome does not depend only on input file size.
I will try again with 750GB memory. But grateful for any suggestions on what is using up memory, or how to process such large data.

I have never tried to meganize a file with 530 million reads (only around 80 million reads).
I will have to revisit my code to see whether there are any obvious bottle-necks that I can remove.
In the current version, I try to avoid creating global tables etc while parsing through a file, but there might well be tables that I thought would never grow so big so as to cause a problem.

I am working on 144 samples (MEGAHIT assembled PE short reads, gene prediction with Prodigal, alignment to nr with diamond). I have successfully used the same pipeline before on 60 metagenome samples in another, very similar project. In the current project, some daa files meganize successfully, some get stuck at 90% writing.

To narrow down this issue, I tried to compare successful and failed samples starting from the gene-predicted faa files, but could not find obvious differences (fasta headers and sequence data look reasonable, ran diamond again and thoroughly checked the log, made sure I can view the daa files, etc.)

Hi everyone, I deleted my last comment because I was finally able to run the meganizer. The total memory it used for a 28GB .daa file was 78G, so increasing the Java heap space to 128G worked. Also increased the number of threads to -t 256 and cores to 32 to avoid any problems.

This may seem crazy simple, but is there a way to simply have the bash script then save/export the meganized summary files (ideally) into a new sub-directory, without having to call JavaApplicationStub?

Once you have a meganized-DAA file, then you can run the daa-info tool to extract a MEGAN summary file.
Or you can use the compute-comparison tool to compute a comparison file for multiple meganized-DAA files.

That said, I had written a little bash script to meganize 40+ .daa files on a compute cluster. Had I known about daa-info and compute-comparison, would this work as well? Replacing e.g. daa-meganizer with daa-info.

You always have to first run daa-meganizer because that program computes the classifications of the reads. Only then can you use daa2info to extract summaries or compute-comparison to compute a comparison of samples.

I am currently using MEGAN for my soil metagenomics. After using the latest version DIAMOND to get the DAA files, I imported these files to MEGAN along with GI mapping files (GI-Taxo, GI-KEGG, GI-SEED) downloaded from MEGAN6 website. The results show none of my sequences can be assigned to any function or taxonomy. Otherwise, the software said that my DAA files from DIAMOND are not DAA files. Did anyone face this kind of situation before?

I already tried meganizer tool but it said Daa is not Daa file. Yesterday, I found out that the Daa file I got from DIAMOND is not actually DAA file because I used -o output.daa instead of -a output.daa. It's OK now but the number of sequences assigned to SEED is too low. It might because of my sequences or something else happened.Thank you guys.