Hello Jian Yu,
Thank you for your question about the memory requirements to run BLAT. One of our engineers suggests that the memory usage for BLAT is usually in the 2-4GB range for eukaryote assemblies. That can be reduced substantially by using BLAT to only search against one chromosome at a time, instead of an entire assembly.
If you will be running commandline standalone blat and would like to split up the genome database into smaller pieces (like one run per chromosome), then you can get greater genome-wide masking consistency by creating an ooc file.
To create the ooc, run the following commands:
blat database.fa /dev/null /dev/null -makeOoc=11.ooc -repMatch=1024
Then to use it on chr1:
blat chr1.fa query.fa chr1.psl -ooc=11.ooc
You can repeat this for each chromosome. More information on using the -ooc option for BLAT can be found in our FAQ at http://genome.ucsc.edu/FAQ/FAQblat.html#blat6. More detailed memory usage information is provided in the documentation at http://genome.ucsc.edu/goldenPath/help/blatSpec.html.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--