adding a description to a fasta header

84 views
Skip to first unread message

Seth Frietze

unread,
Dec 5, 2014, 1:33:30 PM12/5/14
to biop...@googlegroups.com
Hello,
I need to edit the headers of a fasta file. Specifically, I need to add a description to the header. For example, to the below fasta file (with ~50,000 sequences) I need to change the header to have: >comp1_c0_seq1 len=262 path=[2229:0-261] to have >comp1_c0_seq1 comp1_c0 

In other words, I need to keep the SEQ_NAME (comp1_c0_seq1) add a space then add that SEQ_NAME but without the _seq1. I can keep the len= and path= or not.

I ran:
read_fasta -i test.fna | split_vals -k SEQ_NAME -d ' ' | rename_keys -k SEQ_NAME_0,SEQ_NAME | write_fasta -o out.fna -x 
which effectively deleted the len= and path= but now i need to add the extra portion. I have the SEQ_NAME and additional descriptions in a separate file. Will biopieces do this and if so which tool?

Thanks for the great tools!
Seth

>comp1_c0_seq1 len=262 path=[2229:0-261]
GAGATCTCTTTTTACTTAACGCTTAAACATTGAGATGTCAGGATAAGAGGAAGAACTGCA
GGCAGATTTTCAAGACGCCTCCTGGCAATCTGTTTGCTGTCAAAGTTAGAAACTATCAGA
ATAGTTAGAAACTATTGCTATTGGTAGTACATTATCACTAAAGGGGGCTTCTTTTTGCAT
ACCCCTTTGTCTTATGAAAAGGCTTGAACCCACCCTTCTTCATTCTTTAATTGGGAGGGG
GGAAAGAAGTGAAGAATTACTG
>comp3_c0_seq1 len=390 path=[37:0-389]
CCGTGCTTTTCCTTTAAGTGCACTACTTCAAAGAAATTTGGCTGAGTGGGCTTGGCTTTT
TTTAGACAATCTGTTATTGTGCTTTCAACTAAAAAGACACTGAATAAATTATAGATGCTG
GGTTCAGAGCTAAAAAGCAAATGAGCTTATTTGGTGGCTTCAT


Martin Asser Hansen

unread,
Dec 5, 2014, 1:49:26 PM12/5/14
to biop...@googlegroups.com
There are probably many ways to do this. Here is one:

read_fasta -i test.fna | split_vals -k SEQ_NAME -d ' ' | split_vals -k SEQ_NAME_0 | merge_vals -k SEQ_NAME_0_0,SEQ_NAME_0_1 -d '_' | merge_vals -k SEQ_NAME_0,SEQ_NAME_0_0 -d ' ' | rename_keys -k SEQ_NAME_0,SEQ_NAME | write_fasta -x

>comp1_c0_seq1 comp1_c0
GAGATCTCTTTTTACTTAACGCTTAAACATTGAGATGTCAGGATAAGAGGAAGAACTGCAGGCAGATTTTCAAGACGCCTCCTGGCAATCTGTTTGCTGTCAAAGTTAGAAACTATCAGAATAGTTAGAAACTATTGCTATTGGTAGTACATTATCACTAAAGGGGGCTTCTTTTTGCATACCCCTTTGTCTTATGAAAAGGCTTGAACCCACCCTTCTTCATTCTTTAATTGGGAGGGGGGAAAGAAGTGAAGAATTACTG
>comp3_c0_seq1 comp3_c0
CCGTGCTTTTCCTTTAAGTGCACTACTTCAAAGAAATTTGGCTGAGTGGGCTTGGCTTTTTTTAGACAATCTGTTATTGTGCTTTCAACTAAAAAGACACTGAATAAATTATAGATGCTGGGTTCAGAGCTAAAAAGCAAATGAGCTTATTTGGTGGCTTCAT

--
You received this message because you are subscribed to the Google Groups "biopieces" group.
To unsubscribe from this group and stop receiving emails from it, send an email to biopieces+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seth Frietze

unread,
Dec 5, 2014, 2:15:32 PM12/5/14
to biop...@googlegroups.com
Thanks Martin, 
the command worked beautifully!

Seth

--
You received this message because you are subscribed to a topic in the Google Groups "biopieces" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/biopieces/Ta9-RzGtD30/unsubscribe.
To unsubscribe from this group and all its topics, send an email to biopieces+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages