have many samples how to binning

112 views
Skip to first unread message

liuguoqi...@gmail.com

unread,
Aug 15, 2017, 2:00:57 AM8/15/17
to BinSanity
I have many samples how to binning? All assemby.fasta combine one or assembly.fasta seperate?

edgraham

unread,
Aug 19, 2017, 1:41:23 PM8/19/17
to BinSanity, liuguoqi...@gmail.com
Hello,

So BinSanity requires two things, your assembly file, and then BAM files for all the samples used in the assembly. Once you have your assembly and your bam files you can use 'Binsanity-profile' to generate a coverage profile which will have information of read coverage for each contig across all of your samples. That file will then be used as input into Binsanity (typically we choose a scaled transformation which is in the BinSanity-profile script). You can then run BinSanity using those files.

I was unclear what the question was here because I am not sure what the two fasta files you are refering to are, so if I didn't answer your question please let me know and I'll elaborate further.

Regards,
Elaina

liuguoqi...@gmail.com

unread,
Aug 22, 2017, 3:37:38 AM8/22/17
to BinSanity, liuguoqi...@gmail.com
Hello,
thanks for your reply. I confused that the relationship of assembly fasta and BAM files . For example , if I have four metagenomes samples , Individually assembled ,will get 4 assembly fasta files then each sample reads map assembly fasta seperately or combine 4 assembly fasta files to one then use cd-hit to remove redundancy generate "uniq.fasta" finally each sample reads map uniq.fasta ?
Thanks !
GuoqiLiu  

在 2017年8月20日星期日 UTC+8上午1:41:23,edgraham写道:

edgraham

unread,
Aug 23, 2017, 12:49:31 PM8/23/17
to BinSanity, liuguoqi...@gmail.com
So you'll want to create a unique fasta file so cd-hit is one way to do that, then you will generate a bam file using bowtie (or some other mapping software) for each sample individually. You will then have the uniq.fasta and four BAM files. Once you have these you can use Binsanity-profile to generate the coverage file needed for input into Binsanity. This will contain information on the coverage of each contig across all four of your samples. 

liuguoqi...@gmail.com

unread,
Aug 23, 2017, 11:59:17 PM8/23/17
to BinSanity, liuguoqi...@gmail.com

ok,thx!
I got it ! And Do you have a better way to handle it ?

在 2017年8月24日星期四 UTC+8上午12:49:31,edgraham写道:

edgraham

unread,
Sep 5, 2017, 1:30:40 PM9/5/17
to BinSanity, liuguoqi...@gmail.com

So the two methods we have used is either doing a CoAssembly or using a method with Cd-Hit and minimus2 which was written up on protocolsIO here.

I hope that helps!

liuguoqi...@gmail.com

unread,
Sep 5, 2017, 11:15:36 PM9/5/17
to BinSanity, liuguoqi...@gmail.com
ok,thank you very much !
best wishes !

在 2017年9月6日星期三 UTC+8上午1:30:40,edgraham写道:

alanwongd...@gmail.com

unread,
Jun 27, 2018, 2:02:51 PM6/27/18
to BinSanity
To chime in, does Binsanity only work well with multiple bam files and the unique co-assembled fasta file?

I have a trial sample, including one assembled fasta and one sorted bam file.

My command line is as follow:
Binsanity-wf -f /directory -l contig1000.fa -c coverage.cov.x100.lognorm -x 2000 -o Bins

It only produced 12 bins and checkM shows that they all have high contamination level.

So is it better to have a co-assembled fasta file for a better binning performance?

Cheers

Alan


Elaina

unread,
Jun 28, 2018, 2:17:11 PM6/28/18
to BinSanity
Hi Alan,

Binsanity highly relies on differential coverage to cluster so having only 1 bam file would be problematic. In Binsanity-wf you have 3 major steps. 
1. Initial clustering using differential coverage
2. Evaluation via CheckM of bins
3. Refinement of bins using tetranucleotides and GC%

When you only have one sample that initial step where you cluster with differential coverage won't work very well. Assembly methods can vary depending on style. Co-Assemblys work, or you could do something like this to merge individual assemblies using cd-hit and minimus2. You could even do a subsampled assembly. The key is you want some small overlap or similarity between samples for the differential coverage binning to successfully work. In general in terms of clustering with any method (whether it is Binsanity, CONCOCT, Metabat2, etc.) more samples leads to better binning. BinSanity excels when you have atleast 4-5 samples, but accuracy decreases with less. 

My advice for working with one sample is to adjust preferences. In Binsanity-wf there is the flag `-p` which is the preference for the initial clustering. The default for this is -3. Change this to -2. Then change the `--refine-preference` flag to -15. The larger the preference in this case the more sensitive Affinity Propagation is to creating clusters, meaning it will ultimately produce more clusters the higher it is. Unfortunately defaults aren't always optimal for every data type :(

I hope that was helpful!

Alan Wong

unread,
Jun 28, 2018, 8:57:14 PM6/28/18
to BinSanity
Thanks Elaina!

So say I have 5 samples (S1,S2,S3,S4,S5). First I have to do is to co-assemble them together (Concatenate them pre-assembly and proceed to contig assembly).
Then map reads from individual fastq files (S1.fastq, S2.fastq...) and get bam files (S1.bam, S2.bam...S5.bam). Then I could start with Binsanity profile.

Is this right for differential coverage binning?

Cheers

Alan
Reply all
Reply to author
Forward
0 new messages