Trinity de novo transcriptome assembly: samtools view failed to add PG line to the header

178 views
Skip to first unread message

Fairo

unread,
Aug 11, 2022, 4:52:54 AM8/11/22
to trinityrnaseq-users
Dear all,
I hope this message find you well.

I have assembled the transcriptome a plant species using Trinity.

Here is the command:
Trinity --samples_file ./sample_file.txt --seqType fq --max_memory 80G --CPU 30 --output ./trinity_outdir

I tried to get transcript abundance using the following command:
/home/programs/build/trinityrnaseq/util/align_and_estimate_abundance.pl --transcripts ./Trinity.fasta --seqType fq --samples_file ./sample_file.txt --thread_count 6 --gene_trans_map ./trinity_outdir.Trinity.fasta.gene_trans_map --est_method RSEM --aln_method bowtie2 --prep_reference --coordsort_bam --output_dir /rsem_outdir/

And I keep getting the following error:
[E::sam_hrecs_update_hashes] Duplicate entry "TRINITY...." in sam header
samtools view: failed to add PG line to the header


I will appreciate your suggestions.

Brian Haas

unread,
Aug 11, 2022, 4:22:38 PM8/11/22
to trinityrnaseq-users
Hi,

My suggestion would be to just run salmon or kallisto and see how that goes. It should be much faster and use less resources.

best,

~b

Fairo

unread,
Aug 12, 2022, 5:27:57 AM8/12/22
to trinityrnaseq-users
Thank you for your reply, B.

I have tried but it failed. Salmon requires the genome of the organism in both modes. However, I do not have the genome yet. Will it be wise if I remove the duplicated from the trintiy. fasta? I have a single organism with three different tissues.

Brian Haas

unread,
Aug 12, 2022, 5:39:59 AM8/12/22
to Fairo, trinityrnaseq-users
Hi,

Salmon shouldn't need the genome - just needs the Trinity fasta file and the reads you want to use for quantification.  We support it exactly the same as the other quant methods:

https://github.com/trinityrnaseq/trinityrnaseq/wiki/Trinity-Transcript-Quantification

best,

~b

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/b7be4d30-e4b0-438f-a4ba-878925f7dae3n%40googlegroups.com.


--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

DAYOU Olivier

unread,
Aug 12, 2022, 7:07:13 AM8/12/22
to Brian Haas, trinityrnaseq-users
Thank you, Brian.

I have tried the align_and_estimate_abundance.pl with --est_method salmon.
But it fails, here is the error message.


[Step 1 of 4] : counting k-mers
[2022-08-12 12:58:33.007] [puff::index::jointLog] [error] In FixFasta, two references with the same name but different sequences: TRINITY_DN11452_c0_g1_i3. We require that all input records have a unique name up to the first whitespace (or user-provided separator) character.
Error, cmd: salmon index -t /home/organism ./Trinity.fasta --keepDuplicates -i /home/organism/./Trinity.fasta.salmon.idx -k 31 -p 6 died with ret: 256 at /home/programs/build/trinityrnaseq/util/align_and_estimate_abundance.pl line 729.

--
Olivier Dayou| Skype:  Olivier DAYOU                           




DAYOU Olivier

unread,
Aug 12, 2022, 7:09:12 AM8/12/22
to Brian Haas, trinityrnaseq-users
There was also a warning:
[warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing.

Brian Haas

unread,
Aug 12, 2022, 7:34:21 AM8/12/22
to DAYOU Olivier, trinityrnaseq-users
It's quite strange that the identifier TRINITY_DN11452_c0_g1_i3 is apparently showing up multiple times.

Are you running salmon on the results from a single Trinity assembly, or did you perhaps concatenate several together?

DAYOU Olivier

unread,
Aug 12, 2022, 7:42:47 AM8/12/22
to Brian Haas, trinityrnaseq-users
I running salmon on a single trinity assembly generated directly by trinity from three different tissues of the same organism. It is not a concstenated assembly.

You received this message because you are subscribed to a topic in the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/trinityrnaseq-users/VDTunYPVFUM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to trinityrnaseq-u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/trinityrnaseq-users/CAJCu8qNXsVEvXBQV0hTfwk_jLqgBpkOnnjFxyg6tYm03%3DUYGMg%40mail.gmail.com.

Brian Haas

unread,
Aug 12, 2022, 7:46:42 AM8/12/22
to DAYOU Olivier, trinityrnaseq-users
Interesting.

If you grep "TRINITY_DN11452_c0_g1_i3" from the Trinity.fasta file, are you finding multiple records for it?

DAYOU Olivier

unread,
Aug 12, 2022, 7:58:11 AM8/12/22
to Brian Haas, trinityrnaseq-users
Yes, grep TRINITY_DN11452_c0_g1_i3 Trinity.fasta outputs two of the same patterns/ids

Brian Haas

unread,
Aug 12, 2022, 8:19:11 AM8/12/22
to DAYOU Olivier, trinityrnaseq-users
That's very peculiar.  Not supposed to happen.

Try rerunning your original Trinity command. I expect it should just remake the final Trinity.fasta file.  Then, let's see if still contains duplicate entries for this one.


Reply all
Reply to author
Forward
0 new messages