from pygr import blast
operon =
seqdb.SequenceFileDB('OPERONprobes.fasta')
blastmap = blast.BlastMapping(operon)
affy = seqdb.SequenceFileDB('ATH1probes.fasta')
numberprobe = len(affy)
nameprobe = affy.keys()[i]
probemap = affy[nameprobe]
try:
fprobe =
blastmap[probemap]
edges.extend([fprobe.edges()])
del fprobe
except NameError:
edges = fprobe.edges()
except:
pass
seq_db = dict(seq1=affy[affy.keys()[1]], seq2=affy[affy.keys()[2]], ...)
kwargs = {"seq":'operon',
"al":None, "blastpath":'blastall', "blastprog":'blastn',
"expmax":0.001, "maxseq":None, "verbose":None, "queryDB":'seq_db'}
blastmap = blast.BlastMapping(operon,**kwargs)
> 'OPERONprobes.fasta' contains the operon oligos, it's a fasta file with 29000 sequences, and the 'ATH1probes.fasta' contains the affymetrix oligos in fasta format, it contains 250 000 oligos sequences.
> It works this way, however it's rather slow, the complete mapping will take around 40 hours.
> Lookink in the tutorial on the blast module I found that it is possible to pass a 'queryDB' argument in the form of a dictionary containing multiple sequences to be used as queries
>
> Useful methods:
> BlastMapping.__call__(seq, al=None, blastpath='blastall', blastprog=None, expmax=0.001, maxseq=None, opts=", verbose=None, queryDB=None)
>
> I tried several things but I couldn't make it work. I have several questions on how to use this function :
> How to make the dictionary ? In my case should this dictionnary be comprised of the sequence I want to use to query or to map to, meaning should this dictionnary contains the sequence of the Operon arrays or of the Affymetrix arrays ?
> In the kwargs argument, the seq should be the operon I guess ?
> After having done the BlastMapping using the blastmap = blast.BlastMapping(operon,**kwargs), how to query this blast ?
> Do I have to query it sequence by sequence or can I query it using the dictionnary I used as an argument in the kwargs ?
Here's an example that I've abstracted from one of our BLAST tests in tests/blast_test.py (see get_multiblast_results() or test_multiblast_long() for another example):
protDB = seqdb.SequenceFileDB(sp_hbb1)
blastmap = blast.BlastMapping(protDB, verbose=False)
al = cnestedlist.NLMSA('blasthits', 'memory', pairwiseMode=True,
bidirectional=False)
blastmap(al=al, queryDB=protDB) # all vs all
al.build() # construct the alignment indexes
for seq in protDB.values(): # look at hits for one sequence at a time...
results = al[seq]
A few notes:
- the sequence "dictionary" should be a sequence database object. Just pass your "affy" SequenceFileDB object.
- blastmap() stores all the results in a single NLMSA alignment. In the example above the NLMSA is opened in "memory" mode (i.e. stored in RAM); if you want the results stored on disk instead use mode 'w'.
- Note the NLMSA must be created in pairwiseMode=True to reflect the nature of blast mappings (i.e. blast is a pairwise alignment, not a true multiple sequence alignment).
- you access the results from the NLMSA in the usual way as illustrated above...
Hope that helps!
-- Chris Lee
thanks for your answer. I tried with your settings but it doesn't help
to make the query faster, so I guess I just have to wait !
thanks again,
all the best,
Julien