Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Genedoc

3 views
Skip to first unread message

Tijesunimi Odebode

unread,
May 10, 2013, 7:57:45 AM5/10/13
to comp...@magpie.bio.indiana.edu
Good morning,

I am a graduate student and a first time user of genedoc. I tried importing a .mfa (multifasta) alignment file using genedoc, but I got an error message saying "duplicate sequence name found." How do I fix this problem? I will really appreciate any help. Thanks in advance. Here is some part of the file:

TGGATCTGCGGGAGCGTGAGCGGTTGCGCGGACAGCGCCTGCAGGTCGGCCGTGAAGTCA
AGGATGCCTTCTTGTGTTCCGGCGGCCAGGGCATCGGCGATGACCTGAGGCGGCACGTTC
GGCCACAGCCCGAACGGCGTTCGCACATCGGCGTAGCTCGTCGAGTAGCCGTAGTTCGGG
TCGCCGTAGCCCAGGTTGACGATCACC
= score = 59667  type = DM  L1 = 4392353  L2 = 4345492  AL1 = 59664  AL2 = 59636  P_ID = 99.89
>379026087 AP012340:177550-178025 (+)
TGTGCCGGTGTGAGGTCCGCATACGTGGTGTGCACCGTGAGTATCCCGAATACTGCGTTG
ATATCGGACAGGACATTGAGTGGATACCGCGGGAAGTCGGCGAAACCGTCGTACTCGAGG
GTGTAGGTCGTCGTCGGATAGGGATTGTCCGGGG---TCGCCCCGTAGAACGGTAGGCCG
AGGGTGGTGACATTCAGACCGGGTATGCGCGCAAGTATCCCGCCATTGGGATTCATCTCG
TTGCCGATCAAGATGAAATTGAGCTGGCTGGGGCTGGGAGCGTTGGGACCCAGCGAGATG
AGGTGCTGCATTTCCAGGGACGCGATGACGGCGCTCTGCGAATAGCCGAACACGGTGACG
TGGTTTCCGGCGTTGATTTGCTC--CCA-AATCGCGCCGTCGAGAATCTGTAGGCCCAAC
TGCACCGAGGTTTGGAAGGGCAGGGATTTGACGCCGGTGATCGGATATAGCTCTTCGGGC
GT
>31742509 BX248333:180260-180741 (+)
TGCGCGGGCGTGAGGTCCAAATACTTGGTGTGTACGAATGTGATGCCTGCAACCGCGTTG
AGGTCGGAAATGAAGTTGAGCGGGTATCGCGAGAAGTCGGCGAACCCGTCGTACTCGAGC
GTGTAGATGGCCGTCGGATAGATCGTGTCCGAGGGCGTTGCGCCATAGAACGTCAGGTCC
AGAGTCGGAAGCGTCAGATCCGGGAACCGCGCGAGCATACCGCCATTGGGGTTCATTTCA
TTGCCGACAAGCACGAAATTGAGGTCGCTCGCCGAAGGTGCGGCCCCGCCCATCGCCGTG
AACCTCTGCATCTCCAGCGACGCGATTATGGCGCTTTGCGACCAGCCGAAAACGGTGACC
GCGTTTCCGGTGGTCGCGAGCTCTACCATGATCGCGTCGTGCAAGATGGTCAAGCCCTCT
TCCACTGACGTGTTGAGGACCAAACTTCTGACACCGGTGAGTGGGTACAACTCTTCGGGT
GT
= score = 482  type = M2  L1 = 4392353  L2 = 4345492  AL1 = 476  AL2 = 482  P_ID = 69.25
>379026087 AP012340:179382-179864 (+)
CTGCGCGGGCGTGAGGTCCAAATACTTGGTGTGTACGAATGTGATGCCTGCAACCGCGTT
GAGGTCGGAAATGAAGTTGAGCGGGTATCGCGAGAAGTCGGCGAACCCGTCGTACTCGAG
CGTGTAGATGGCCGTCGGATAGATCGTGTCCGAGGGCGTTGCGCCATAGAACGTCAGGTC
CAGAGTCGGAAGCGTCAGATCCGGGAACCGCGCGAGCATACCGCCATTGGGGTTCATTTC
ATTGCCGACAAGCACGAAATTGAGGTCGCTCGCCGAAGGTGCGGCCCCGCCCATCGCCGT
GAACCTCTGCATCTCCAGCGACGCGATTATGGCGCTTTGCGACCAGCCGAAAACGGTGAC
CGCGTTTCCGGTGGTCGCGAGCTCTACCATGATCGCGTCGTGCAAGATGGTCAAGCCCTC
TTCCACTGACGTGTTGAGGACCAAACTTCTGACACCGGTGAGTGGGTACAACTCTTCGGG
TGT
>31742509 BX248333:178426-178902 (+)
CTGTGCCGGTGTGAGGTCCGCATACGTGGTGTGCACCGTGAGTATCCCGAATACTGCGTT
GATATCGGACAGGACATTGAGTGGATACCGCGGGAAGTCGGCGAAACCGTCGTACTCGAG
GGTGTAGGTCGTCGTCGGATAGGGATTGTCCGGGG---TCGCCCCGTAGAACGGTAGGCC
GAGGGTGGTGACATTCAGACCGGGTATGCGCGCAAGTATCCCGCCATTGGGATTCATCTC
GTTGCCGATCAAGATGAAATTGAGCTGGCTGGGGCTGGGAGCGTTGGGACCCAGCGAGAT
GAGGTGCTGCATTTCCAGGGACGCGATGACGGCGCTCTGCGAATAGCCGAACACGGTGAC


Tijesunimi Odebode

Ivan Erill

unread,
May 10, 2013, 1:28:40 PM5/10/13
to Tijesunimi Odebode, comp...@magpie.bio.indiana.edu
Many bioinformatics programs (e.g. CLUSTALW) will take the first 10 digits
of a FASTA line as relevant for naming.
So
>379026087 AP012340:177550-178025 (+)
and
>379026087 AP012340:179382-179864 (+)
are actually the same name as far as the program is concerned.

Ivan Erill
> _______________________________________________
> Comp-bio mailing list
> Comp...@net.bio.net
> http://www.bio.net/biomail/listinfo/comp-bio
>
0 new messages