gene-trans-map from non-trinity assembly

948 views
Skip to first unread message

M. Olalla Lorenzo-Carballa

unread,
May 24, 2016, 7:39:38 PM5/24/16
to trinityrnaseq-users
Hello

I came across a coouple of posts on this, but in my case I cannot make it work. I want to try to run the scripts within trinity on a non-trinity assembly. My data were assembled with trinity and afterwards i used Corset to cluster them. I parsed the output from corset to create my custom gene to transcript map which looks like this:

Cluster-23194.1    Cluster-23194.1_TRINITY_DN5872_c0_g1_i2
Cluster-23194.1    Cluster-23194.1_TRINITY_DN5872_c0_g1_i1
Cluster-23194.0    Cluster-23194.0_TRINITY_DN17006_c0_g1_i6
Cluster-23194.0    Cluster-23194.0_TRINITY_DN17006_c0_g1_i5
Cluster-23194.0    Cluster-23194.0_TRINITY_DN17006_c0_g1_i4
Cluster-23194.0    Cluster-23194.0_TRINITY_DN17006_c0_g1_i1
Cluster-23194.0    Cluster-23194.0_TRINITY_DN17006_c0_g1_i7
Cluster-23194.0    Cluster-23194.0_TRINITY_DN17006_c0_g1_i3
Cluster-23194.0    Cluster-23194.0_TRINITY_DN17006_c0_g1_i2
Cluster-22292.0    Cluster-22292.0_TRINITY_DN16113_c0_g1_i3
Cluster-22292.0    Cluster-22292.0_TRINITY_DN16113_c0_g1_i2
Cluster-22292.0    Cluster-22292.0_TRINITY_DN16113_c0_g1_i1
Cluster-22704.0    Cluster-22704.0_TRINITY_DN1042_c0_g1_i1
Cluster-4190.0    Cluster-4190.0_TRINITY_DN26939_c0_g1_i1


However when running the align and estimate abundance script with the --gene_trans_map option, my RSEM outputs look like this:

the genes.results:

gene_id transcript_id(s)        length  effective_length        expected_count  TPM     FPKM
Cluster-0.0_TRINITY_DN2726_c0_g1_i1     Cluster-0.0_TRINITY_DN2726_c0_g1_i1     262.00  113.72  0.00    0.00    0.00
Cluster-0.0_TRINITY_DN2726_c0_g2_i1     Cluster-0.0_TRINITY_DN2726_c0_g2_i1     262.00  113.72  0.00    0.00    0.00
Cluster-1.0_TRINITY_DN18976_c12_g1_i3   Cluster-1.0_TRINITY_DN18976_c12_g1_i3   912.00  762.22  0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i1   Cluster-1.1_TRINITY_DN18976_c12_g1_i1   343.00  193.44  0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i5   Cluster-1.1_TRINITY_DN18976_c12_g1_i5   864.00  714.22  0.00    0.00    0.00
Cluster-1.2_TRINITY_DN18976_c12_g1_i2   Cluster-1.2_TRINITY_DN18976_c12_g1_i2   781.00  631.22  0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i4   Cluster-1.3_TRINITY_DN18976_c12_g1_i4   268.00  119.53  0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i6   Cluster-1.3_TRINITY_DN18976_c12_g1_i6   839.00  689.22  0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i7   Cluster-1.3_TRINITY_DN18976_c12_g1_i7   829.00  679.22  0.00    0.00    0.00


and the isoforms.results:

transcript_id   gene_id length  effective_length        expected_count  TPM     FPKM    IsoPct
Cluster-0.0_TRINITY_DN2726_c0_g1_i1     Cluster-0.0_TRINITY_DN2726_c0_g1_i1     262     113.72  0.00    0.00    0.00    0.00
Cluster-0.0_TRINITY_DN2726_c0_g2_i1     Cluster-0.0_TRINITY_DN2726_c0_g2_i1     262     113.72  0.00    0.00    0.00    0.00
Cluster-1.0_TRINITY_DN18976_c12_g1_i3   Cluster-1.0_TRINITY_DN18976_c12_g1_i3   912     762.22  0.00    0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i1   Cluster-1.1_TRINITY_DN18976_c12_g1_i1   343     193.44  0.00    0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i5   Cluster-1.1_TRINITY_DN18976_c12_g1_i5   864     714.22  0.00    0.00    0.00    0.00
Cluster-1.2_TRINITY_DN18976_c12_g1_i2   Cluster-1.2_TRINITY_DN18976_c12_g1_i2   781     631.22  0.00    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i4   Cluster-1.3_TRINITY_DN18976_c12_g1_i4   268     119.53  0.00    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i6   Cluster-1.3_TRINITY_DN18976_c12_g1_i6   839     689.22  0.00    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i7   Cluster-1.3_TRINITY_DN18976_c12_g1_i7   829     679.22  0.00    0.00    0.00    0.00
Cluster-10.0_TRINITY_DN25056_c0_g1_i1   Cluster-10.0_TRINITY_DN25056_c0_g1_i1   283     134.14  0.00    0.00    0.00    0.00


So basically it seems to be ignoring completely the information on my provided map file and both the gene and isoform files are the same.... Does anyone have an idea on why this could be happening?  The map file is a tab-delimited one, so it should 8potentially work) as I have seen from other posts in the group.

Any input would be greatly appreciated


Thanks in advance

Cheers

Olalla

Brian Haas

unread,
May 25, 2016, 7:30:04 AM5/25/16
to M. Olalla Lorenzo-Carballa, trinityrnaseq-users
Hi

Can you send the command you ran for the align and estimate step?

-Brian
(by iPhone)

--
You received this message because you are subscribed to the Google Groups "trinityrnaseq-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to trinityrnaseq-u...@googlegroups.com.
To post to this group, send email to trinityrn...@googlegroups.com.
Visit this group at https://groups.google.com/group/trinityrnaseq-users.
For more options, visit https://groups.google.com/d/optout.

M. Olalla Lorenzo-Carballa

unread,
May 25, 2016, 8:09:27 AM5/25/16
to Brian Haas, trinityrnaseq-users
Hello Brian

Thanks for your (as always) quick response. See the command below:

perl /pub46/olalla/trinityrnaseq-2.1.1/util/align_and_estimate_abundance.pl --thread_count 40 --gene_trans_map /pub46/olalla/DE_after_corset/Hastata_All_CORSET.gene_trans_map --output_dir /pub46/olalla/DE_after_corset --transcripts /pub46/olalla/DE_after_corset/Hastata_Corseted_Assembly.fasta --seqType fq --left /pub46/olalla/Individual_reads/ASEX1-Head-1.fq --right /pub46/olalla/Individual_reads/ASEX1-Head-2.fq --output_prefix ASEX1 --coordsort_bam --est_method RSEM --aln_method bowtie 

No errors reported and all ran normally only that the output files for RSEM did not make sense according to the gene to transcript map provided.


Hope you can help me with this.


However now I came accross another problem: while I waited for the answer to my question I have been searching for potential causes for this mistake and I have tried to remove any possible not-unix characters from my map file. After that I wanted to re-run the align_and_estimate_abundance.pl script again, using the bam files that I do have from the first run. Nevertheless when I try to run the script like this:

perl /pub46/olalla/trinityrnaseq-2.1.1/util/align_and_estimate_abundance.pl --thread_count 40 --gene_trans_map /pub46/olalla/DE_after_corset/Hastata_All_CORSET.gene_trans_map --output_dir /pub46/olalla/DE_after_corset --transcripts /pub46/olalla/DE_after_corset/Hastata_All_CORSET_Assembly.fasta --est_method RSEM --transcripts /pub46/olalla/DE_after_corset/Hastata_Corseted_Assembly.fasta --seqType fq --left /pub46/olalla/Individual_reads/ASEX1-Head-1.fq --right /pub46/olalla/Individual_reads/ASEX1-Head-2.fq --output_prefix ASEX1 --aln_method /pub46/olalla/DE_after_corset/ASEX1.bam

but I get another error, despite that the manual it says the following:

if alignment_based est_method:
#       --aln_method <string>            bowtie|bowtie2|(path to bam file) alignment method.  (note: RSEM requires bowtie)
#                                       (if you already have a bam file, you can use it here instead of rerunning bowtie)
#

I have tried lots of different things and not success on this, so I am a bit stuck at the moment.

Thanks in advance for your help


Cheers

Olalla

--
M. Olalla Lorenzo-Carballa, PhD.

Institute of Integrative Biology 
Evolution, Ecology and Behaviour
 
Biosciences Building, 
Crown Street, 
University of Liverpool,
Liverpool, L69 7ZB. 
United Kingdom.


Brian Haas

unread,
May 25, 2016, 8:53:34 AM5/25/16
to M. Olalla Lorenzo-Carballa, trinityrnaseq-users
My guess is that an RSEM index was built earlier on w/out having used the gene-trans-map file, and this rsem index is being reused.

Try deleting any RSEM-related files attached to:  /pub46/olalla/DE_after_corset/Hastata_Corseted_Assembly.fasta

and then rerunning the align-estimate step, and let's see how it goes.

best,

~brian

M. Olalla Lorenzo-Carballa

unread,
May 25, 2016, 9:50:55 AM5/25/16
to Brian Haas, trinityrnaseq-users
Hello Brian

Thanks again for your input. I am not sure, however to which problem of the two I have addressed you refer in your message.

In any case, and to be safe I have deleted all the files (even the bam files) generated from running the script yesterday and now I am rerunning it from scratch, by providing the same gene map file but after cleaning it from any potential non unix character.

hopefully all will be fine. I cannot see what is happening beyond that, as, from what I saw in other posts within the group it seems like the scripts will address any custom map file as long as it is in the format gene(tab)transcript, right?

Will give feedback on the results of the re-trial... hope will be for good!

Cheers

Olalla

Brian Haas

unread,
May 25, 2016, 9:54:24 AM5/25/16
to M. Olalla Lorenzo-Carballa, trinityrnaseq-users
Let's see what happens on the re-run and we'll take it from there.

best,

~b
--
--
Brian J. Haas
The Broad Institute
http://broadinstitute.org/~bhaas

 

M. Olalla Lorenzo-Carballa

unread,
May 25, 2016, 10:47:22 AM5/25/16
to Brian Haas, trinityrnaseq-users
Half way through now and what I see is that the results are the same. See below the output of head both genes and isoforms results files:

olalla@ada06:~/DE_after_corset$ head ASEX1.genes.results
gene_id    transcript_id(s)    length    effective_length    expected_count    TPM    FPKM
Cluster-0.0_TRINITY_DN2726_c0_g1_i1    Cluster-0.0_TRINITY_DN2726_c0_g1_i1    262.00    113.72    0.00    0.00    0.00
Cluster-0.0_TRINITY_DN2726_c0_g2_i1    Cluster-0.0_TRINITY_DN2726_c0_g2_i1    262.00    113.72    0.00    0.00    0.00
Cluster-1.0_TRINITY_DN18976_c12_g1_i3    Cluster-1.0_TRINITY_DN18976_c12_g1_i3    912.00    762.22    0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i1    Cluster-1.1_TRINITY_DN18976_c12_g1_i1    343.00    193.44    0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i5    Cluster-1.1_TRINITY_DN18976_c12_g1_i5    864.00    714.22    0.00    0.00    0.00
Cluster-1.2_TRINITY_DN18976_c12_g1_i2    Cluster-1.2_TRINITY_DN18976_c12_g1_i2    781.00    631.22    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i4    Cluster-1.3_TRINITY_DN18976_c12_g1_i4    268.00    119.53    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i6    Cluster-1.3_TRINITY_DN18976_c12_g1_i6    839.00    689.22    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i7    Cluster-1.3_TRINITY_DN18976_c12_g1_i7    829.00    679.22    0.00    0.00    0.00
olalla@ada06:~/DE_after_corset$ head ASEX1.isoforms.results

transcript_id    gene_id    length    effective_length    expected_count    TPM    FPKM    IsoPct
Cluster-0.0_TRINITY_DN2726_c0_g1_i1    Cluster-0.0_TRINITY_DN2726_c0_g1_i1    262    113.72    0.00    0.00    0.00    0.00
Cluster-0.0_TRINITY_DN2726_c0_g2_i1    Cluster-0.0_TRINITY_DN2726_c0_g2_i1    262    113.72    0.00    0.00    0.00    0.00
Cluster-1.0_TRINITY_DN18976_c12_g1_i3    Cluster-1.0_TRINITY_DN18976_c12_g1_i3    912    762.22    0.00    0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i1    Cluster-1.1_TRINITY_DN18976_c12_g1_i1    343    193.44    0.00    0.00    0.00    0.00
Cluster-1.1_TRINITY_DN18976_c12_g1_i5    Cluster-1.1_TRINITY_DN18976_c12_g1_i5    864    714.22    0.00    0.00    0.00    0.00
Cluster-1.2_TRINITY_DN18976_c12_g1_i2    Cluster-1.2_TRINITY_DN18976_c12_g1_i2    781    631.22    0.00    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i4    Cluster-1.3_TRINITY_DN18976_c12_g1_i4    268    119.53    0.00    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i6    Cluster-1.3_TRINITY_DN18976_c12_g1_i6    839    689.22    0.00    0.00    0.00    0.00
Cluster-1.3_TRINITY_DN18976_c12_g1_i7    Cluster-1.3_TRINITY_DN18976_c12_g1_i7    829    679.22    0.00    0.00    0.00    0.00
olalla@ada06:~/DE_after_corset$

Any other idea on what may be going on? Is it possible that the "-" and "." in the Cluster part of the name are creating a problem?


I am just trying to think on possible causes to this, thanks in advance for any other input


Best
Olalla

Brian Haas

unread,
May 25, 2016, 11:04:58 AM5/25/16
to M. Olalla Lorenzo-Carballa, trinityrnaseq-users
I'm not sure what the issue is here.  You can try running kallisto instead and see if that does what you want.  If you want to stick with RSEM, then perhaps try running RSEM directly instead of using our wrapper, and you can troubleshoot RSEM directly with the RSEM team.

best,

~brian

M. Olalla Lorenzo-Carballa

unread,
May 26, 2016, 8:16:30 AM5/26/16
to trinityrnaseq-users, m.o.lorenz...@gmail.com
To close this thread :)

So, whole day trying to figure out why this was happening as it did not make any sense to me, parsing all the files, and trying again with the same error, JUST NOW I realized that my script for preparing the reference was missing the --gene_trans_map option, hence the outputs, no matter whether I provided with this option later in the alignment step.

So, now it is sorted... silly mistakes!

So the script runs perfectly with any custom trans map provided...
Reply all
Reply to author
Forward
0 new messages