Hi everyone,
I have two questions,
I have a file with hundreds of motifs and a file with hundreds of fasta sequences. Is there a way to get the name of the fasta sequence next to the recognized motif?
The name of each fasta file is in UCSC style genomic coordinates so I am able to parse genomic coordinates. However, I would like to retain the UCSC style name somewhere in the output file.
Right now my fimo command looks like this:
fimo --text --parse-genomic-coord --bgfile bg.txt copy_motifs/ABI5_col_v3h.txt fasta_peaks/ABI5_col_v3h.fasta > ABI5_col_v3h_m1_fimo.txt
And my output looks like so:
#pattern name sequence name start stop strand score p-value q-value matched sequence
bZIP_tnt.ABI5_col_v3h_m1 chr2 12108264 12108281 + 9.09524 1.86e-05 ATGATGCCACGTGTACTT
bZIP_tnt.ABI5_col_v3h_m1 chr2 12108266 12108283 - 17.3651 9.31e-07 TGAAGTACACGTGGCATC
bZIP_tnt.ABI5_col_v3h_m1 chr2 12108288 12108305 - 22.7937 9.32e-09 AGTTGCTGACGTGGCACT
Is there an option to put the name of the fasta sequence in the output file? (ie. chr2:12108195-12108396) if that is where the matched sequence came from?
Also, I was wondering if anyone had any experience performing fimo on multiple motifs and databases. Right now I have a directory structure where I have a folder with 100 motifs and I have a folder with 100 sequence databases. In both folders the corresponding files I wish to use have the same name, with diferent suffixs (*.motifs, *.fasta, where appropriate) Is there a way to automate this? I'm thinking along the lines of for f in /copy peaks... but I don't know how to incorporate two files.
Thanks!
Alex