Dear Don and other readers,
I have a question and I’m not sure if I did something incorrectly. I hope you can help me.
I have combined two different assemblers (SPAdes and Trinity) for 15 replicates of a sea anemone species. Every three replicates correspond to a different thermal stress condition (26°C to 35°C) across five time points. I used tr2aacds.pl to combine all these assemblies (n=30) to generate a pan-transcriptome reference. Currently, I have a very large output with the following results:
My goal is to obtain a reference transcriptome that can later be used for mapping reads with Salmon and for differential expression analysis with DESeq2.
My questions are:
1. How can I reduce this high percentage of duplicates? Would it be a good idea to keep the 123,447 transcripts that are categorized as “main”?
2. Do you think the high level of duplication (90.6%) could affect downstream quantification and differential expression analysis? If so, what strategies would you recommend to reduce redundancy or manage isoforms in this context?
I would greatly appreciate any guidance.
Best regards,
evigene/scripts/omcl/evg_buscogenesum.pl
usage:
env dotab=1 summary=busco.sum.txt evg_buscogenesum.pl buscof/full_table*.tsv
where 'summary=busco.sum.txt' is the summary output file, and 'dotab=1' means
rewrite the busco full_table.tsv changing spurious 'Duplicate' to 'Complete' for
cases of alternates of one gene locus
One output of tr2aacds is a table of gene locus, alt. transcript ids, that may help.
You can create a separate sequence set of only main transcripts, ie. all with ID suffix 't1',
but the reason for testing all transcripts for homology (busco, other) is that alternates 't2..tN'
sometimes have much greater homology than the 't1' longest protein alternate.
See here
https://sourceforge.net/p/evidentialgene/blog/2018/03/gene-transcript-id-table-from-evgmrna2tsa/
--
You received this message because you are subscribed to the Google Groups "EvidentialGene" group.
To unsubscribe from this group and stop receiving emails from it, send an email to evidentialgen...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/evidentialgene/3c8c6fa7-7c67-435a-889b-762784d8be4cn%40googlegroups.com.