Problem using commandline BLAT with large/multiple input files

292 views
Skip to first unread message

Florian

unread,
Sep 23, 2015, 11:28:04 AM9/23/15
to gen...@soe.ucsc.edu
Hello,

This might be a very basic question, but I just cant seem to solve my
problem and have no experience with BLAT.

I am currently trying to follow this protocol:
http://bioinf.uni-greifswald.de/augustus/binaries/readme.rnaseq.html

I have several fastq-files from RNA-Seq, each between 4 and 21 gb. Now I
tried making a list of the query filenames and used it as input for BLAT
but that does not seem to work and I just cant seem to figure out how to
give multiple query files as input. I tried concatenating them all to
one super large file but then i receive the following error message:

needLargeMem: trying to allocate 230896608790 bytes (limit: 17179869184)

actually that one already happens when I try to run it with a single
file of 20gb size. I am working on a cluster and the 20gb file should
normally be no problem for the system.


So basically I have two questions: How can I run BLAT on multiple input
files? And how can i deal with large input files?


thanks for any help,
Florian

Matthew Speir

unread,
Sep 23, 2015, 4:35:56 PM9/23/15
to Florian, gen...@soe.ucsc.edu
Hi Florian,

Thank you for your questions about using BLAT. The command line BLAT
program can only take a single input file at a time. You if you have
multiple query files, then you will need to run BLAT for each query
files. There is no way to tell a single BLAT instance to use multiple
input files.

One of our engineers notes that BLAT has its own memory limitations
based on its design that are independent of the system it's being run
on. If your input files are too large, then you can split your input
file up into a number of smaller files. If you're running this on a
cluster then you'd benefit even more from splitting up the input files.
That way you can take advantage of multiple cluster nodes to speed up
the alignment process. We do this all the time.

Note that BLAT requires sequences to be input in fasta, nib or 2bit
format. You can read about these different formats here:
http://genome.ucsc.edu/FAQ/FAQformat.html.

You can run BLAT on the command line without any arguments to see all of
the input options. You can also learn more about BLAT on the following
help pages:
http://genome.ucsc.edu/FAQ/FAQblat.html
http://genome.ucsc.edu/goldenPath/help/blatSpec.html

Lastly, you may want to look into some other programs, such as BWA or
Bowtie, that designed for aligning short reads from RNA sequencing to a
genome. Often these programs will take fastq files as input.

I hope this is helpful. If you have any further questions, please reply
to gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group
Reply all
Reply to author
Forward
0 new messages