Re: Hash table and memory requirement

24 views
Skip to first unread message

Nathan Clement

unread,
Nov 11, 2013, 6:29:37 PM11/11/13
to gnumap...@googlegroups.com
Let me try and explain as best I can. There's a paper that goes into more technical detail if you're interested: http://www.hicomb.org/papers/HICOMB2011-04.pdf


On Sun, Nov 10, 2013 at 8:46 AM, Casey <vuanh...@gmail.com> wrote:
Please help to confirm/clarify the following for running GNUMAP with MPI:

1. One hash table for the reference genome is built for each process (even when the processes are on the same machine), isn't it?
 
There is an important difference between "process" and "thread" in this context. When running on a single process, you can use multiple threads, which will all share the same hash table. However, when you run with multiple processes (using MPI, for example), there is no way to share the same hash table, so these must be allocated for each process. Usually MPI is used when you have many different machines available.
 

2. I think the mer-size the program uses by default is 14, am I right? (version 3.0.2)

The --fast flag sets the mer size to be 14. The default is 10.
 

3. If my reference genome is in a folder of multiple FASTA files, the program will build 1 hash table for all the files?

You can't pass in a folder, but you can pass in multiple FASTA files. But yes, it will build one hash table for the entire genome.
 

4. Do you have an estimation for memory requirement if the reference genome is the whole human genome HG19?

Are you just doing a naieve alignment? Around 12GB should be sufficient.
 

5. I still don't understand how the --MPI_largemem work. Can you explain it again?

Please see the attached paper for more details. In a nutshell, when the entire genome fits on a single process (see explanation above) but you have multiple machines available, you can run with MPI without the --MPI_largemem flag and it will duplicate the hash table, genome, etc. Each process will then map only a select portion of the reads.

If you don't have enough space to fit the entire genome on a single machine, pass the --MPI_largemem flag and GNUMAP will automatically split up the genome into equally-sized chunks and map each of the reads to each section.  

 

I'm looking forward for your reply/explanation. It would be very helpful. Thank you.

Casey

--
 
---
You received this message because you are subscribed to the Google Groups "gnumap-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gnumap-users...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Reply all
Reply to author
Forward
0 new messages