Annotate circular dna

denco...@gmail.com

unread,

Feb 9, 2017, 12:36:32 AM2/9/17

to pydna

Hi,
Circular DNA is a variable for sequences, but is pydna able to find sequences/strings that 'run' over the last and first nucleotide of the sequence?
I'm looking to auto annotate plasmids using an in-house feature database.
Best regards
Pieter

Björn Johansson

unread,

Feb 9, 2017, 6:18:37 AM2/9/17

to pydna, denco...@gmail.com

Hi, thanks for your interest in pydna.

It does not at the moment, but there is code in the pydna.amplify module that is able to find

primers that "run over the edge", so it would not be hard to implement it.

I have thought of this myself, but I didn't have any need for it so far.

What would you like to do i.e what kind of result should be returned?

Another thing to consider is if an exact match is required? Some software such as plasmapper

seems to allow small sequence divergence.

/bjorn

Mark Budde

unread,

Feb 9, 2017, 12:35:20 PM2/9/17

to Björn Johansson, pydna, denco...@gmail.com

This is what I use:

def find_sites(target, site):

from re import finditer

if hasattr(target, '_seq'):

target = target.seq #convert to Dseq from Dseqrecord

target = str(target).upper()+str(target).upper()[:len(site)-1] #allow wrapping around origin

site = str(site).upper()

ss = '(?='+str(site)+')'

hits = [(m.start(),1) for m in

finditer(ss, target)] # get a list of matches of sense strand

ss = '(?='+str(rc(site))+')'

hits += [(m.start(),-1) for m in finditer(ss, target)] #get antisense matches

returns = []

for hit in hits:

try:

if not hit[0] in [x[0] for x in returns]: #remove palindromic hits

returns.append(hit)

except:

pass

return returns

Björn Johansson

unread,

Feb 14, 2017, 11:40:54 AM2/14/17

to pydna, bjor...@gmail.com, denco...@gmail.com

Hi Mark,

that looks like it should work, I am thinking of implementing something based on that.

I see you specify site as a string, I find it convenient to use the Bio.Restriction module.

The circular Dseqrecords already allow cutting "over the edge" but they return only the fragments of course.

cheers,

Bjorn

On Thursday, February 9, 2017 at 5:35:20 PM UTC, Mark Budde wrote:

This is what I use:

def find_sites(target, site):
from re import finditer

if hasattr(target, '_seq'):
target = target.seq #convert to Dseq from Dseqrecord
target = str(target).upper()+str(target).upper()[:len(site)-1] #allow wrapping around origin
site = str(site).upper()
ss = '(?='+str(site)+')'
hits = [(m.start(),1) for m in
finditer(ss, target)] # get a list of matches of sense strand
ss = '(?='+str(rc(site))+')'
hits += [(m.start(),-1) for m in finditer(ss, target)] #get antisense matches
returns = []
for hit in hits:
try:
if not hit[0] in [x[0] for x in returns]: #remove palindromic hits
returns.append(hit)
except:
pass
return returns

Björn Johansson

unread,

Feb 14, 2017, 11:58:18 AM2/14/17

to pydna, denco...@gmail.com

Could you give some more information on this as to the nature of the features (like if they are continuous)

and what kind of output would be desired?

Mark Budde

unread,

Feb 14, 2017, 11:59:52 AM2/14/17

to Björn Johansson, pydna, Pieter Coussement

How do you use the bio.restriction module in this case? I always cast as a string and convert the upper(), since seq objects are case sensitive.

Thanks,

Mark

Björn Johansson

unread,

Feb 14, 2017, 12:15:13 PM2/14/17

to pydna, bjor...@gmail.com, denco...@gmail.com

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

#Using the Bio.Restriction.Analysis function:

from pydna import Dseqrecord

s = Dseqrecord("GGATCCggatcc")

from Bio.Restriction import BamHI

a,b,c = s.cut(BamHI)

print(a.seq.fig())

print("\n")

print(b.seq.fig())

print("\n")

print(c.seq.fig())

from Bio.Restriction import Analysis

ana = Analysis([BamHI], s.seq)

ana.print_that()

Björn Johansson

unread,

Feb 16, 2017, 9:29:05 AM2/16/17

to pydna, denco...@gmail.com

I made this based on Mark Budde's code:

def find_subsequence(s, subs):
    
    if s.linear:
        s = str(s.seq).upper()
    else:
        s = str(s.seq).upper()+str(s.seq).upper()[:len(subs)-1] #allow wrapping around origin
    
    subs = str(subs.seq).upper()
 
    return s.find(subs)

from pydna.dseqrecord import Dseqrecord

x=Dseqrecord("gatc", circular=True)
y=Dseqrecord("tcg")

print(find_subsequence(x,y))

x=Dseqrecord("gatc", linear=True)
y=Dseqrecord("tcg")

print(find_subsequence(x,y))

If there is interest, this could make it into a method for Dseqrecords, but only if it is useful. I am a bit afraid of feature creep....

Reply all

Reply to author

Forward