SOAPdenovo2 without '-u' -no average contig coverage reported

200 views
Skip to first unread message

Marek Piatek

unread,
Sep 29, 2012, 4:54:57 AM9/29/12
to bgi-...@googlegroups.com
Hi SOAPers,

While running SOAPdenovo2 with '-u' (un-mask contigs with high/low coverage before scaffolding, [mask]) I don't get reported what is the average contig coverage at the beginning of 'scaff' step. I also ran the same command but without '-u' and I get the average contig information reported. (see below for fragments of the reports)

Can the average contig coverage be simply calculated by extracting the coverage info from the fastA header of each contig sequence inside 'filename.contig' file? Or maybe it is stored somewhere?
For example considering this as a fastA header of one of the contigs '>22960 length 4378 cvg_41.2_tip_1', is the base coverage of it equal to 41.2?
At the same time what is the meaning of '_tip_1' in this header?

I would appreciate some help here.

Fragment1:
********************
Scaff (with '-u')
********************

Parameters: scaff -g filename -F -p 8 -u

Files for scaffold construction are OK.

There are 1 grad(s), 10338762 read(s), max read len 101.
Kmer size: 31
There are 22961 edge(s) in edge file.
Mask contigs shorter than 33, 19233 contig(s) masked.
30756 arc(s) loaded, average weight is 111.
11483 contig(s) loaded.
Done loading updated edges.
Time spent on loading updated edges: 0s.

*****************************************************
Start to load paired-end reads information (...)

Fragment 2:
********************
Scaff (w/o '-u')
********************

Parameters: scaff -g filename -F -p 8

Files for scaffold construction are OK.

There are 1 grad(s), 10338762 read(s), max read len 101.
Kmer size: 31
There are 22961 edge(s) in edge file.
Mask contigs with coverage lower than 3.9 or higher than 78.0, and strict length 100.
Average contig coverage is 39, 14887 contig(s) masked.
Mask contigs shorter than 33, 5908 contig(s) masked.
30756 arc(s) loaded, average weight is 65.
11483 contig(s) loaded.
Done loading updated edges.
Time spent on loading updated edges: 0s.

*****************************************************
Start to load paired-end reads information (...)

Richard Buggs

unread,
Oct 8, 2012, 9:24:54 AM10/8/12
to bgi-...@googlegroups.com
Hi all, 

Did anyone respond to this, please?

My scaff.log file begins like this:

Test version 2.0: released on  July 13th, 2011
Compile Jun  5 2012 11:17:12
there're 6 grads, 381167444 reads, max read len 95
K = 35
there're 13378859 edge in edge file
mask contigs with low coverage 2.2 and high coverage 44.0 and strict length 100
average contig coverage is 22, 5646385 contig masked
Mask contigs shorter than 37, 4088020 contig masked
18227836 arcs loaded, average weight is 13
input 6667512 contigs

I assume that this means that the average contig coverage is 22, once contigs with below 2.2 and above 44.0 coverage have been excluded.

Are the masked contigs included as contigs in the final assembly file after the scaffolding? I assume it is in these contigs that I would expect to find repetitive elements?

many thanks,

Richard



--
You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
To post to this group, send email to bgi-...@googlegroups.com.
To unsubscribe from this group, send email to bgi-soap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/bgi-soap/-/vvPexMSZoVcJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

-- 
Richard Buggs MA, DPhil
NERC Fellow & Senior Lecturer
School of Biological and Chemical Sciences
Queen Mary, University of London
London
E1 4NS
United Kingdom

email: r.b...@qmul.ac.uk
website: http://www.sbcs.qmul.ac.uk/staff/richardbuggs.html
office: +44(0)207 882 3058
mobile: +44(0)772 992 0401
twitter: @RJABuggs

李振宇

unread,
Oct 8, 2012, 10:04:01 AM10/8/12
to bgi-...@googlegroups.com
Hi Richard,

SOAPdenovo will mask, or exclude as you said, contigs with very low and high coverage, 0.1 and 2 time of the average coverage by default. 
And at the end of scaffolding, SOAPdenovo will try to put these masked contigs into assembled scaffolds.
In the end, those contigs which are not located in scaffolds will be output as singleton if they are in length of 100 bp or above. These contigs are labled as '>C' while scaffolds are labled as '>scaffold'.

As for the header of contig in *.contig file, it's correct that the number following cvg is the coverage of this contig, 41.2 in you example. And the 'tip' indicates whether this contig has KMER bp overlap with other contigs at both ends. If both ends overlap with other contigs, the flag is 0. Otherwise it is 1.

best,

发件人: bgi-...@googlegroups.com [bgi-...@googlegroups.com] 代表 Richard Buggs [r.b...@qmul.ac.uk]
发送时间: 2012年10月8日 21:24
到: bgi-...@googlegroups.com
主题: Re: [BGI-SOAP:632] SOAPdenovo2 without '-u' -no average contig coverage reported

Reply all
Reply to author
Forward
0 new messages