Help - how to execute multiple lot of file to blast

11 views
Skip to first unread message

Leandro de Mattos

unread,
Mar 11, 2015, 10:48:46 AM3/11/15
to unix-and-perl-...@googlegroups.com
Dear colleagues,

I have lot of files with one protein each and I going to want execute a blast of each file againts an database (Eukaryotes).
Is there a way for to do it automatically?


Thanks for any help.

Leandro


Matthew

unread,
Mar 11, 2015, 2:36:02 PM3/11/15
to unix-and-perl-...@googlegroups.com
Check out Find::File and File::Recursive. You can use these modules to open all or multiple files in a folder using a regex to select the particular files, or also to open particular folders and pick a file from the folder.

Matthew
--
You received this message because you are subscribed to the Google Groups "Unix and Perl for Biologists" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unix-and-perl-for-bi...@googlegroups.com.
To post to this group, send email to unix-and-perl-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unix-and-perl-for-biologists.
For more options, visit https://groups.google.com/d/optout.

Keith Bradnam

unread,
Mar 11, 2015, 2:45:38 PM3/11/15
to unix-and-perl-...@googlegroups.com
If:
  1. You want to search all sequences against the same database
  2. Each entry ends with a newline after the last amino acid character
  3. You are not particularly interested in the order in which you search the sequences 
Then you could just combine all sequences into one file to start with. You then have to process the multiple results, but they will be in one output file.

E.g. cat *.fasta > all_sequences_combined.fasta

If condition 2 is not met, then you will end up with problem FASTA records that might look like this (where the 2nd record gets mistakenly combined on to the end of the first record):

>AT1G15970.1 | Symbols:  | methyladenine glycosylase family protein | chr1:5486538-5488488 REVERSE
MSVPPRFRSVNSDEREFRSVLGPTGNKLQRKPPGMKLEKPMMEKTIIDSKDEKAKKPTTP
ASPRTTLKQCSSLCSSILRKNSASMTASYSSDASSSCESSPLSVASSSSCKKVVRRSGSV
SSTRKLSVGKEEEKVSGDCFADGRKRCAWITPKADPCYVAFHDEEWGVPVHDDKKLFELL
CLSGALAELSWTDILSRRHILREVFMDFDPVAVAELNDKKLTAPGTAAISLLSEVKIRSI
LDNSRHVRKIIAECGSLKKYMWNFVNNKPTQSQFRYQRQVPVKTSKAEFISKDLVRRGFR
SVSPTVIYSFMQAAGLTNDHLIGCFRYQDCCVDAETTTTTKAKKKNERESDK*>AT1G73440.1 | Symbols:  | calmodulin-related | chr1:27615079-27615843 FORWARD
MARGESEGESSGSERESSSSSSGNESEPTKGTISKYEKQRLSRIAENKARLDALGISKAA
KALLSPSPVSKKRRVKRNSGEEDDDYTPVIADGDGDEDDDEVEEIDEDEEFLCKRKNKSS
ASKRKVSSRKILNTSVSLGEDDDDLDKAIALSLQGSVAGSDKEAATMKKKRPELMSKTQM
TQDELVMYFCQFDEGGKGFITLRDVAKMATVHDFTWTEEELQDMIRCFDMDKDGKLSLDE
FRKIVSRCRMLKGS*

Alex Van Dam

unread,
Mar 11, 2015, 2:53:55 PM3/11/15
to unix-and-perl-...@googlegroups.com
ls -1 *my_file_handle | awk '{ print "blastp -options -db etc. -query "$1" -o "$1"_blastp_out &" }' | less > commands_for_multi_blast.sh

sh commands_for_multi_blast.sh

there are many flavors of this depending on your system but this is basically how it's done you might not want the "&" at the end and add other code to the end or have nothing depending on your system, but the "&" basically lets you run multiple jobs at the same time, there are other ways to do this i'm sure but this is a "one-liner" way.

Cheers,

Alex



--
You received this message because you are subscribed to the Google Groups "Unix and Perl for Biologists" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unix-and-perl-for-bi...@googlegroups.com.
To post to this group, send email to unix-and-perl-...@googlegroups.com.
Visit this group at http://groups.google.com/group/unix-and-perl-for-biologists.
For more options, visit https://groups.google.com/d/optout.



--
Mr. Alex Van Dam, Ph.D. 
NSF Postdoctoral Researcher
Frandsen Lab
Denmark Technical University
Department of Systems Biology
Søltofts plads, Building 223
2800 Kgs. Lyngby
Denmark

Reply all
Reply to author
Forward
0 new messages