print(myseq.circular)
print(myseq_long.circular)
Gives:
True
False
These files are identical other than for their different LOCUS lines.
# This gets parsed correctly and shows up as circular
T5pr-6xHis-MBP-HRV_3C-fzq1-spprot-2xHJHJNLS-Link8-MIND10SO-mungedheader.gb
LOCUS T5pr-6xHis 9838 bp ds-DNA circular UNA 03-JAN-2019
#This does not get parsed correctly as circular
T5pr-6xHis-MBP-HRV_3C-fzq1-spprot-2xHJHJNLS-Link8-MIND10SO.gb
LOCUS T5pr-6xHis-MBP-HRV_3C-fzq1-spprot-2xHJHJNLS-Link8-MIND10SO 9838 bp DNA circular UNA 03-JAN-2019
Both of their names get parsed correctly inside Biopython ( see below).
I was wondering how I can change the pyDNA code to handle them correctly so as to not have to roundtrip the sequence through Biopython and then into pyDNA.
I did see some code which uses pyparsing to handle the parsing , but was hoping for some pointers on where to make changes.
Really like the pyDNA package and would love to contribute my change back if successful.
Thanks
Hari
Biopython handling of the files:
from Bio import SeqIO
files = ["T5pr-6xHis-MBP-HRV_3C-fzq1-spprot-2xHJHJNLS-Link8-MIND10SO-mungedheader.gb","T5pr-6xHis-MBP-HRV_3C-fzq1-spprot-2xHJHJNLS-Link8-MIND10SO.gb"]
for _file in files:
seqr = SeqIO.parse(_file,format="genbank")
for _rec in seqr:
print(_rec.name)
T5pr-6xHis
T5pr-6xHis-MBP-HRV_3C-fzq1-spprot-2xHJHJNLS-Link8-MIND10SO