how to consolidate multiple assemblies using Salmon eq_classes instead of bam files

43 views
Skip to first unread message

podu...@gmail.com

unread,
Jun 11, 2019, 1:40:17 PM6/11/19
to corset-project
Hi
There was an earlier discussion of how to consolidate multiple assembly types, but it used .bam files (see link below from 9/14/2017). The procedure was basically to run corset, first, with the .bam files and the -r true-stop option across different assemblies for the same sample. This would produce a .corset-read file for each sample. Then those files would be input for a second corset run across the samples (ie, with -g describing the sample groups and -i corset for the .corset-read input type).

I was going to follow this method for my Salmon eq_classes files, but realized that with -i salmon_eq_classes for the first run, it turns off the -r option, so I do not get any .corset-read files from the first run (ie, one for each sample). All I get out are cluster and count files.

So is it possible to first consolidate assemblies for each sample using the Salmon eq-classes files? And then do a second run of corset across all the samples.

If not, do I have to write a huge list of assemblies (17) and samples (45)= 765 names in one command! I suppose I could also re-run it and make .bam files

Any suggestions would be appreciated!
Peter


https://groups.google.com/forum/#!topic/corset-project/ShzIVSWOWZk

Nadia Davidson

unread,
Jun 11, 2019, 4:30:14 PM6/11/19
to corset-project
Hi Peter,

The salmon eq-classes have similar information to the .corset-read files (both summarise read alignments to equivalence classes and list transcript compatibility counts). To merge assemblies you really need to start with the read information (bam level).

The fastest approach I think would be to realign the data with salmon, but using all assemblies together. I don't think salmon takes multiple assemblies in its index creation, but you can make one big assembly with cat like "cat assembly1.fasta assembly2.fasta etc.. > all_assembies.fasta", then run salmon with all_assemblies.fasta to get one eq-classes file per sample and run this through corset in the usual way.

Good luck.

Cheers,
Nadia.

Peter O Dunn

unread,
Jun 11, 2019, 5:36:11 PM6/11/19
to Nadia Davidson, corset-project

Dear Nadia

 

Thanks for the tip.  That makes a lot of sense.  Now I’m wondering why I didn’t think of that earlier.  And thanks for the quick response!

Best,

Peter

 

--

Peter Dunn

Dist. Professor

Dept of Biological Sciences

Univ. of Wisconsin-Milwaukee

PO Box 413

Milwaukee, WI 53201

414-229-2253

http://people.uwm.edu/pdunn/

--
You received this message because you are subscribed to the Google Groups "corset-project" group.
To unsubscribe from this group and stop receiving emails from it, send an email to corset-projec...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/corset-project/e965addf-88d0-4415-be20-36bdfd4dc173%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages