Hi Florian,
Thank you for your questions about using BLAT. The command line BLAT
program can only take a single input file at a time. You if you have
multiple query files, then you will need to run BLAT for each query
files. There is no way to tell a single BLAT instance to use multiple
input files.
One of our engineers notes that BLAT has its own memory limitations
based on its design that are independent of the system it's being run
on. If your input files are too large, then you can split your input
file up into a number of smaller files. If you're running this on a
cluster then you'd benefit even more from splitting up the input files.
That way you can take advantage of multiple cluster nodes to speed up
the alignment process. We do this all the time.
Note that BLAT requires sequences to be input in fasta, nib or 2bit
format. You can read about these different formats here:
http://genome.ucsc.edu/FAQ/FAQformat.html.
You can run BLAT on the command line without any arguments to see all of
the input options. You can also learn more about BLAT on the following
help pages:
http://genome.ucsc.edu/FAQ/FAQblat.html
http://genome.ucsc.edu/goldenPath/help/blatSpec.html
Lastly, you may want to look into some other programs, such as BWA or
Bowtie, that designed for aligning short reads from RNA sequencing to a
genome. Often these programs will take fastq files as input.
I hope this is helpful. If you have any further questions, please reply
to
gen...@soe.ucsc.edu. All messages sent to that address are archived
on a publicly-accessible Google Groups forum. If your question includes
sensitive data, you may send it instead to
genom...@soe.ucsc.edu.
Matthew Speir
UCSC Genome Bioinformatics Group