Merging outputs from multiple samples

193 views
Skip to first unread message

Giorgio Casaburi

unread,
Oct 27, 2017, 5:06:08 PM10/27/17
to shortbred-users
Hi all,

I have a couple of question I was hoping you could answer:

1) Is there a script to merge multiple output files from Shortbread? (Like metaphlan or humann2 have). 

2) Once you merge the files, does the merged file need to be re-normalized based on the way Shortbread generatew the protein profile? 

Thanks a lot in advance,
Giorgio

Jim Kaminski

unread,
Oct 28, 2017, 4:03:20 PM10/28/17
to Giorgio Casaburi, shortbred-users
Hi Giorgio,

Thank you for using ShortBRED!

1) We currently do not have a script for merging ShortBRED output. Typically, I merge ShortBRED output in R. Python users may want to use pandas.

If there is demand for a script, I could add one to the utilities.

2) No, you do not need to renormalize the files. Assuming each of your output files corresponds to one particular metagenomic sample, the values in the "Count" column are already normalized. (If you would like more details on the normalization procedure, it's explained in the section "Profiling protein family metagenomic abundance with ShortBRED-Quantify" in the ShortBRED paper.)

Thank you,

Jim


--
You received this message because you are subscribed to the Google Groups "shortbred-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-users+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giorgio Casaburi

unread,
Oct 28, 2017, 4:12:27 PM10/28/17
to Jim Kaminski, shortbred-users
Hi Jim,

Thank you so much for your answer. I thouth you needed to re-normalize the RPKM values just like humann2 reccomends after merging tables from different samples. Maybe you guys have used a different approach to obtain the count than then.

Giorgio 

To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-use...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
________________________________
Giorgio Casaburi, Ph.D.
Bioinformatics Scientist
2121 Second Street
Suite B107
Davis, CA 95618

Eric Franzosa

unread,
Oct 30, 2017, 10:00:16 AM10/30/17
to Giorgio Casaburi, Jim Kaminski, shortbred-users
Hi Giorgio,

Speaking on behalf of the HUMAnN2 team: confirmed, HUMAnN2 outputs abundance in units of RPK (not RPKM), meaning that it's initial output is _not_ normalized for sequencing depth. The reason for this is 1) to allow users to pick their means of normalization (relative abundance, copies per million, etc.) and 2) to enable analyses that depend on more count-like abundance, such as strain profiling.

Thanks,
Eric


On Sat, Oct 28, 2017 at 4:12 PM, Giorgio Casaburi <giorgio...@gmail.com> wrote:
Hi Jim,

Thank you so much for your answer. I thouth you needed to re-normalize the RPKM values just like humann2 reccomends after merging tables from different samples. Maybe you guys have used a different approach to obtain the count than then.

Giorgio 
On Sat, Oct 28, 2017 at 1:03 PM Jim Kaminski <jim.ka...@gmail.com> wrote:
--
________________________________
Giorgio Casaburi, Ph.D.
Bioinformatics Scientist

--

Giorgio Casaburi

unread,
Oct 30, 2017, 10:29:50 AM10/30/17
to Eric Franzosa, Jim Kaminski, shortbred-users
Hi Eric,

1000 thanks for the clarification. It’s great to know the difference between the two outputs now.

Best wishes,
Giorgio


To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-use...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
________________________________
Giorgio Casaburi, Ph.D.
Bioinformatics Scientist

--
You received this message because you are subscribed to the Google Groups "shortbred-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to shortbred-use...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages