Annotate circular dna

21 views
Skip to first unread message

denco...@gmail.com

unread,
Feb 9, 2017, 12:36:32 AM2/9/17
to pydna
Hi,
Circular DNA is a variable for sequences, but is pydna able to find sequences/strings that 'run' over the last and first nucleotide of the sequence?
I'm looking to auto annotate plasmids using an in-house feature database.
Best regards
Pieter

Björn Johansson

unread,
Feb 9, 2017, 6:18:37 AM2/9/17
to pydna, denco...@gmail.com
Hi, thanks for your interest in pydna.

It does not at the moment, but there is code in the pydna.amplify module that is able to find 
primers that "run over the edge", so it would not be hard to implement it.

I have thought of this myself, but I didn't have any need for it so far.

What would you like to do i.e what kind of result should be returned? 
Another thing to consider is if an exact match is required? Some software such as plasmapper
seems to allow small sequence divergence.

/bjorn

Mark Budde

unread,
Feb 9, 2017, 12:35:20 PM2/9/17
to Björn Johansson, pydna, denco...@gmail.com
This is what I use:

def find_sites(target, site):
    from re import finditer
    
    if hasattr(target, '_seq'):
        target = target.seq #convert to Dseq from Dseqrecord
    target = str(target).upper()+str(target).upper()[:len(site)-1] #allow wrapping around origin
    site = str(site).upper() 
    ss = '(?='+str(site)+')'
    hits = [(m.start(),1) for m in
        finditer(ss, target)] # get a list of matches of sense strand
    ss = '(?='+str(rc(site))+')'
    hits += [(m.start(),-1) for m in finditer(ss, target)] #get antisense matches
    returns = []
    for hit in hits:
        try:
            if not hit[0] in [x[0] for x in returns]: #remove palindromic hits
                returns.append(hit)
        except:
            pass
    return returns

Björn Johansson

unread,
Feb 14, 2017, 11:40:54 AM2/14/17
to pydna, bjor...@gmail.com, denco...@gmail.com
Hi Mark,

that looks like it should work, I am thinking of implementing something based on that.
I see you specify site as a string, I find it convenient to use the Bio.Restriction module.

The circular Dseqrecords already allow cutting "over the edge" but they return only the fragments of course.  
cheers,
Bjorn


On Thursday, February 9, 2017 at 5:35:20 PM UTC, Mark Budde wrote:
This is what I use:

def find_sites(target, site):
    from re import finditer
    
    if hasattr(target, '_seq'):
        target = target.seq #convert to Dseq from Dseqrecord
    target = str(target).upper()+str(target).upper()[:len(site)-1] #allow wrapping around origin
    site = str(site).upper() 
    ss = '(?='+str(site)+')'
    hits = [(m.start(),1) for m in
        finditer(ss, target)] # get a list of matches of sense strand
    ss = '(?='+str(rc(site))+')'
    hits += [(m.start(),-1) for m in finditer(ss, target)] #get antisense matches
    returns = []
    for hit in hits:
        try:
            if not hit[0] in [x[0] for x in returns]: #remove palindromic hits
                returns.append(hit)
        except:
            pass
    return returns

Björn Johansson

unread,
Feb 14, 2017, 11:58:18 AM2/14/17
to pydna, denco...@gmail.com
Could you give some more information on this as to the nature of the features (like if they are continuous) 
and what kind of output would be desired?

Mark Budde

unread,
Feb 14, 2017, 11:59:52 AM2/14/17
to Björn Johansson, pydna, Pieter Coussement
How do you use the bio.restriction module in this case? I always cast as a string and convert the upper(), since seq objects are case sensitive.
Thanks,
Mark

Björn Johansson

unread,
Feb 14, 2017, 12:15:13 PM2/14/17
to pydna, bjor...@gmail.com, denco...@gmail.com
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
#Using the Bio.Restriction.Analysis function:

from pydna import Dseqrecord

s = Dseqrecord("GGATCCggatcc")

from Bio.Restriction import BamHI

a,b,c = s.cut(BamHI)

print(a.seq.fig())
print("\n")
print(b.seq.fig())
print("\n")
print(c.seq.fig())

from Bio.Restriction import Analysis

ana = Analysis([BamHI], s.seq)

ana.print_that()

Björn Johansson

unread,
Feb 16, 2017, 9:29:05 AM2/16/17
to pydna, denco...@gmail.com
I made this based on Mark Budde's code:

def find_subsequence(s, subs):
   
   
if s.linear:
        s
= str(s.seq).upper()
   
else:
        s
= str(s.seq).upper()+str(s.seq).upper()[:len(subs)-1] #allow wrapping around origin
   
    subs
= str(subs.seq).upper()
 
   
return s.find(subs)

from pydna.dseqrecord import Dseqrecord

x
=Dseqrecord("gatc", circular=True)
y
=Dseqrecord("tcg")

print(find_subsequence(x,y))

x
=Dseqrecord("gatc", linear=True)
y
=Dseqrecord("tcg")

print(find_subsequence(x,y))

If there is interest, this could make it into a method for Dseqrecords, but only if it is useful. I am a bit afraid of feature creep....
Reply all
Reply to author
Forward
0 new messages