help for soapdenovo2

374 views
Skip to first unread message

sendru

unread,
Jan 10, 2013, 12:37:59 AM1/10/13
to bgi-...@googlegroups.com
Hello,

I recently follow the pipeline of YH_assembly in soapdenovo2 paper to assemble American Black Bear genome, and I have three questions.

The first one is for the option -y in SOAPfilter_v2.0 for filter reads with adapter, and I don't know the exact adapter, all I know is that the reads are from Illumina Hiseq2000, 101pe. Should I use the default option for adapter or just don't check the adapter contamination?

The sequence coverage does not look good, the raw data is about 90G bases, however, after filter for low quality reads and duplicates, only 70G left, and SOAPdenovo2 reported on average the read coverage for a contig is 12, As a result, when I try to align black bear ests(independent data source) to the resulting black bear genome, I found a lot of sequence errors, should I set different parameters for my data and how?

The third question is for the input file, I know that in error correction step the base quality is used, but after that, does the SOAPdenovo modulo use the base quality information?  if the fastq are accepted the same as fasta format, I prefer that Corrector_HA_v2.0 output the fasta format.

Thank you.

Sendru

The pre computation step is:

SOAPfilter_v2.0 -f 0 -t 32 -l 300 -z -p -g ABB1.fq ABB2.fq ABB.filter.stat ABB1.fq.clean ABB2.fq.clean
KmerFreq_HA_v2.0 -k 23 -f 0 -t 32 -i 400000000 -L 101 -l ecfile -p ABB_k23
Corrector_HA_v2.0 -k 23 -l 2 -e 1 -w 1 -q 30 -r 45 -t 32 ABB_k23.freq.gz ecfile

The configure file is:

#maximal read length
max_rd_len=101
[LIB]
#average insert size
avg_ins=300
#if sequence needs to be reversed
reverse_seq=0
#in which part(s) the reads are used
asm_flags=3
#use only first 50 bps of each read
rd_len_cutoff=101
#in which order the reads are used while scaffolding
rank=1
# cutoff of pair number for a reliable connection (default 3/5)
pair_num_cutoff=3
#minimum aligned length to contigs for a reliable read location pair-end/ mate-pair (default 32/35)
map_len=32

q1=/picb/functgen3/AmericanBlackBear/ABB1.fq.clean.gz.cor.pair_1.fq
q2=/picb/functgen3/AmericanBlackBear/ABB2.fq.clean.gz.cor.pair_2.fq

and the main step is:

pregraph_sparse_63mer.v1.0.3 -s /picb/functgen3/AmericanBlackBear/soapABBconfig -K 35 -z 3000000000 -o 35mer -p 32
SOAPdenovo-63mer contig -s /picb/functgen3/AmericanBlackBear/soapABBconfig -g 35mer -p 32 -M 2
SOAPdenovo-63mer map -s /picb/functgen3/AmericanBlackBear/soapABBconfig -g 35mer -p 32
SOAPdenovo-63mer scaff -g 35mer -p 32 -F
GapCloser -a 35mer.scafSeq -b /picb/functgen3/AmericanBlackBear/soapABBconfig -o 35merfinal.fa -t 32 -p 31 -l 101


Albert J.Poustka

unread,
Jan 10, 2013, 4:46:22 AM1/10/13
to bgi-...@googlegroups.com
Hi,

Where can i find SOAPfilter_v2.0 ??

Greetings,

Albert
--
You received this message because you are subscribed to the Google Groups "BGI-SOAP" group.
To post to this group, send email to bgi-...@googlegroups.com.
To unsubscribe from this group, send email to bgi-soap+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/bgi-soap/-/Og0IvAksUkQJ.
For more options, visit https://groups.google.com/groups/opt_out.
 
 


  

Ruibang Luo

unread,
Jan 10, 2013, 4:50:12 AM1/10/13
to bgi-...@googlegroups.com
ftp://public.genomics.org.cn/BGI/SOAPdenovo2

rb



On 2013-1-10, at 下午5:46, Albert J.Poustka <pou...@molgen.mpg.de> wrote:

> Hi,
>
> Where can i find SOAPfilter_v2.0 ??
>
> Greetings,
>
> Albert
>
>
> On 2013-01-10 06:37, sendru wrote:
>> Hello,
>>
>> I recently follow the pipeline of YH_assembly in soapdenovo2 paper to assemble American Black Bear genome, and I have three questions.
>>
>> The first one is for the option -y in SOAPfilter_v2.0 for filter reads with adapter, and I don't know the exact adapter, all I know is that the reads are from Illumina Hiseq2000, 101pe. Should I use the default option for adapter or just don't check the adapter contamination?
>>
>> The sequence coverage does not look good, the raw data is about 90G bases, however, after filter for low quality reads and duplicates, only 70G left, and SOAPdenovo2 reported on average the read coverage for a contig is 12, As a result, when I try to align black bear ests(independent data source) to the resulting black bear genome, I found a lot of sequence errors, should I set different parameters for my data and how?
>>
>> The third question is for the input file, I know that in error correction step the base quality is used, but after that, does the SOAPdenovo modulo use the base quality information? if the fastq are accepted the same as fasta format, I prefer that Corrector_HA_v2.0 output the fasta format.
>>
>> Thank you.
>>
>> Sendru
>>
>> *The pre computation step is:*
>>
>> SOAPfilter_v2.0 -f 0 -t 32 -l 300 -z -p -g ABB1.fq ABB2.fq ABB.filter.stat ABB1.fq.clean ABB2.fq.clean
>> KmerFreq_HA_v2.0 -k 23 -f 0 -t 32 -i 400000000 -L 101 -l ecfile -p ABB_k23
>> Corrector_HA_v2.0 -k 23 -l 2 -e 1 -w 1 -q 30 -r 45 -t 32 ABB_k23.freq.gz ecfile
>>
>> *The configure file is:*
>>
>> #maximal read length
>> max_rd_len=101
>> [LIB]
>> #average insert size
>> avg_ins=300
>> #if sequence needs to be reversed
>> reverse_seq=0
>> #in which part(s) the reads are used
>> asm_flags=3
>> #use only first 50 bps of each read
>> rd_len_cutoff=101
>> #in which order the reads are used while scaffolding
>> rank=1
>> # cutoff of pair number for a reliable connection (default 3/5)
>> pair_num_cutoff=3
>> #minimum aligned length to contigs for a reliable read location pair-end/ mate-pair (default 32/35)
>> map_len=32
>>
>> q1=/picb/functgen3/AmericanBlackBear/ABB1.fq.clean.gz.cor.pair_1.fq
>> q2=/picb/functgen3/AmericanBlackBear/ABB2.fq.clean.gz.cor.pair_2.fq
>>
>> *and the main step is:*

Bent Petersen

unread,
Jan 10, 2013, 5:05:05 AM1/10/13
to bgi-...@googlegroups.com
I'm not able to connect to the FTP server, does anyone else have that problem?

Bent Petersen
Post.doc, Ph.D.

Center for Biological Sequence Analysis
Department of Systems Biology
Technical University of Denmark
Kemitorvet, Building 208
2800 Lyngby

Mobile phone: (+45) 2084 7492
Skype ID: bentpetersen1979
E-mail: be...@cbs.dtu.dk
www.cbs.dtu.dk/

Ruibang Luo

unread,
Jan 10, 2013, 5:24:22 AM1/10/13
to bgi-...@googlegroups.com
Try using FileZilla, I'm success with it.

rb

sendru

unread,
Jan 11, 2013, 3:57:17 AM1/11/13
to bgi-...@googlegroups.com, luoru...@genomics.org.cn
Hi,

I know for the first two questions (unknown adapter and low read coverage), it is not the best place to resolve it. but could you explain to me that whether there is a difference if the error correction modulo output fastq format or fasta format?

Thank you.

Sendru

在 2013年1月10日星期四UTC+8下午6时24分22秒,Ruibang Luo写道:

Ruibang Luo

unread,
Jan 11, 2013, 4:52:46 AM1/11/13
to sendru, bgi-...@googlegroups.com
here is no difference to SOAPdenovo2.

rb



On 2013-1-11, at 下午4:57, sendru <sen...@gmail.com> wrote:

> Hi,
>
> I know for the first two questions (unknown adapter and low read coverage),
> it is not the best place to resolve it. but could you explain to me that
> whether there is a difference if the error correction modulo output fastq
> format or fasta format?
>
> Thank you.
>
> Sendru
>
> 在 2013年1月10日星期四UTC+8下午6时24分22秒,Ruibang Luo写道:
>>
>> Try using FileZilla, I'm success with it.
>>
>> rb
>>
>>
>>
>> On 2013-1-10, at 下午6:05, Bent Petersen <be...@cbs.dtu.dk <javascript:>>
>> wrote:
>>
>>> I'm not able to connect to the FTP server, does anyone else have that
>>> problem?
>>>
>>> Bent Petersen
>>> Post.doc, Ph.D.
>>>
>>> Center for Biological Sequence Analysis
>>> Department of Systems Biology
>>> Technical University of Denmark
>>> Kemitorvet, Building 208
>>> 2800 Lyngby
>>>
>>> Mobile phone: (+45) 2084 7492
>>> Skype ID: bentpetersen1979
>>> E-mail: be...@cbs.dtu.dk <javascript:>
>>> www.cbs.dtu.dk/
>>>
>>>
>>> On Thu, Jan 10, 2013 at 10:50 AM, Ruibang Luo <luoru...@genomics.org.cn<javascript:>>wrote:
>>
>>>
>>>> • ftp://public.genomics.org.cn/BGI/SOAPdenovo2
>>>>
>>>> rb
>>>>
>>>>
>>>>
>>>> On 2013-1-10, at 下午5:46, Albert J.Poustka <pou...@molgen.mpg.de<javascript:>>
>>>>>> To post to this group, send email to bgi-...@googlegroups.com<javascript:>.
>>
>>>>>> To unsubscribe from this group, send email to
>>>> bgi-soap+u...@googlegroups.com <javascript:>.
>>>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msg/bgi-soap/-/Og0IvAksUkQJ.
>>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>> Groups "BGI-SOAP" group.
>>>>> To post to this group, send email to bgi-...@googlegroups.com<javascript:>.
>>
>>>>> To unsubscribe from this group, send email to
>>>> bgi-soap+u...@googlegroups.com <javascript:>.
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>>
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>> Groups
>>>> "BGI-SOAP" group.
>>>> To post to this group, send email to bgi-...@googlegroups.com<javascript:>.
>>
>>>> To unsubscribe from this group, send email to
>>>> bgi-soap+u...@googlegroups.com <javascript:>.
>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>>
>>>>
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>> Groups "BGI-SOAP" group.
>>> To post to this group, send email to bgi-...@googlegroups.com<javascript:>.
>>
>>> To unsubscribe from this group, send email to
>> bgi-soap+u...@googlegroups.com <javascript:>.

kaboroevich

unread,
Mar 1, 2013, 3:49:10 AM3/1/13
to bgi-...@googlegroups.com, luoru...@genomics.org.cn
Hi,

Is it possible to filter single end reads with 'SOAPfilter_v2.0'?

Keith

lizhenyu

unread,
Mar 1, 2013, 5:21:10 AM3/1/13
to bgi-...@googlegroups.com, luoru...@genomics.org.cn
You can make false PE-read files by treating the single end read as both read1 and read2 of PE-read.
Then set the same cutoff for read1 and read2.
 
However, since the filtering of adaptor depends on PE-read (real PE-read), you can't filter the adaptor in this way.
And there is no need to filter the PE-reads of undersize insert size.
 
At last, you will get filtered read1 and read2 files which are identical. Use one of them as your final result.
 
2013-03-01

BGI  Zhenyu Li

发件人: kaboroevich
发送时间: 2013-03-01  16:49:23
主题: Re: [BGI-SOAP:684] help for soapdenovo2
Hi,

Is it possible to filter single end reads with 'SOAPfilter_v2.0'?

Keith

On Thursday, January 10, 2013 6:50:12 PM UTC+9, Ruibang Luo wrote:
To unsubscribe from this group and stop receiving emails from it, send an email to bgi-soap+u...@googlegroups.com.

To post to this group, send email to bgi-...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msg/bgi-soap/-/SkvPegZbQiQJ.
Reply all
Reply to author
Forward
0 new messages