On Sun, Apr 11, 2021 at 2:05 PM Emily Jennings <
emy...@gmail.com> wrote:
>
> I think this may be an issue with BUSCO. I searched the actual .cds files output by TransDecoder for several of the transcripts that are 'missing' according to BUSCO and found 'complete' cds present:
>
> $ head BUSCOs_missing_after_TransDecoder_fulllist.txt
> 4291 Complete TRINITY_DN33922_c0_g1_i9 945.2 795
> 4459 Fragmented TRINITY_DN22208_c0_g1_i5 825.9 424
>
> # BUSCO reports them as missing
> $ grep -w '4291' longest_orfscds_Unstranded_BUSCO/run_embryophyta_odb10/full_table.tsv
> 4291 Missing
> $ grep -w '4459' longest_orfscds_Unstranded_BUSCO/run_embryophyta_odb10/full_table.tsv
> 4459 Missing
>
> # But they are indeed present in the .cds output (whether I ran with -S or not)
> $ grep -w TRINITY_DN22208_c0_g1_i5 Trinity_500_longestisoform.fasta.transdecoder.cds
> >TRINITY_DN22208_c0_g1_i5|m.5407 TRINITY_DN22208_c0_g1_i5|g.5407 ORF TRINITY_DN22208_c0_g1_i5|g.5407 TRINITY_DN22208_c0_g1_i5|m.5407 type:complete len:169 (+) TRINITY_DN22208_c0_g1_i5:2005-2511(+)
> >TRINITY_DN22208_c0_g1_i5|m.5406 TRINITY_DN22208_c0_g1_i5|g.5406 ORF TRINITY_DN22208_c0_g1_i5|g.5406 TRINITY_DN22208_c0_g1_i5|m.5406 type:5prime_partial len:572 (+) TRINITY_DN22208_c0_g1_i5:2-1717(+)
> $ grep -w TRINITY_DN33922_c0_g1_i9 Trinity_500_longestisoform.fasta.transdecoder.cds
> >TRINITY_DN33922_c0_g1_i9|m.1708 TRINITY_DN33922_c0_g1_i9|g.1708 ORF TRINITY_DN33922_c0_g1_i9|g.1708 TRINITY_DN33922_c0_g1_i9|m.1708 type:complete len:209 (+) TRINITY_DN33922_c0_g1_i9:3159-3785(+)
> >TRINITY_DN33922_c0_g1_i9|m.1706 TRINITY_DN33922_c0_g1_i9|g.1706 ORF TRINITY_DN33922_c0_g1_i9|g.1706 TRINITY_DN33922_c0_g1_i9|m.1706 type:complete len:336 (+) TRINITY_DN33922_c0_g1_i9:99-1106(+)
> >TRINITY_DN33922_c0_g1_i9|m.1707 TRINITY_DN33922_c0_g1_i9|g.1707 ORF TRINITY_DN33922_c0_g1_i9|g.1707 TRINITY_DN33922_c0_g1_i9|m.1707 type:complete len:286 (+) TRINITY_DN33922_c0_g1_i9:1239-2096(+)
>
>
> On Sunday, April 11, 2021 at 1:35:40 PM UTC-4 Emily Jennings wrote:
>>
>> I re-ran TransDecoder without the -S flag and the BUSCO results were unchanged (still ~49% complete after TransDecoder).
>>
>> On Sunday, April 11, 2021 at 12:55:50 PM UTC-4 Emily Jennings wrote:
>>>
>>> Hi Brian, thanks for the fast reply!
>>>
>>> As you suggested, I extracted the transcript sequences for all BUSCOs that were present in step 3 but missing from step 4. I am sending you a copy of this file.
>>>
>>> I randomly selected one of these seqs (TRINITY_DN34533_c0_g1_i1, which matched initially with BUSCO 986) and searched it using ORFfinder (NCBI), which was able to identify multiple ORFs, one of which was quite long:
>>>
>>>
>>>