One of our engineers had this to say:
Regarding the message:
Couldn't bind socket to 8000: Address already in use Error accepting the connection Error accepting the connection [...]This seems to indicate that the gfServer is still running on the port. In order to stop gfServer from the commandline, it must be STARTED with -canStop option. I believe the problem of failing to connect over and over has been fixed. Are you using an old version of BLAT??
But really that will not matter at all because you should not be using blat this way. With a cluster you should be using stand-alone blat and not gfClient/gfServer. Genome data is often "embarrassingly parallelizable", and that certainly applies to alignment.
What we do is split the target and query into multiple parts
and then run jobs which blat each combination of parts Qi
against Tj. The simplest way to split them up would be to have a
job for each chromosome. A more sophisticated method splits
chromosomes into chunks (possibly overlapping), running
standalone blat on the pieces' combinations on cluster and then
lifting them back into place and chaining the results back
together. If the queries are lots of already small pieces like
genes or RNAs, then you can still split your many sequences into
into multiple input files and then have a cluster job to call
blat for each input file against some target chromosome (or
chunk).
You can find many detailed examples of using BLAT on clusters
in our make docs in our source tree. Look under
kent/src/hg/makeDb/doc/. Grep the .txt files for "blat". Many of
the examples will show the use of parasol to run cluster jobs.
Parasol was created by Jim Kent. Note that Jim Kent is the
author and owner of BLAT.
--
***NOTE: Be sure to download the gfClient and gfServer programs. Bellerophon uses the Blat server and not the Blat executable. This results in a considerable increase in speed.
Hello Alan,
You will have to check with the authors of Bellerophon for the specific performance issues they encountered with command-line BLAT versus a gfServer/gfClient setup; it is possible their tool makes queries in a manner better handled by gfServer. I can say that users on our cluster have never reported a BLAT performance issue that was resolved by switching to the client/server version. If you are interested in trying command-line BLAT with Bellerophon for comparison, you may be able to make that change relatively easily - command-line BLAT accepts many of the same parameters as gfServer/Client.
Please let us know if you continue to have issues with the updated version of BLAT.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. Questions sent to that address will be archived in a publicly-accessible forum for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--