Re: Digest for varid-community@googlegroups.com - 1 Message in 1 Topic

8 views
Skip to first unread message

uche okeke

unread,
Feb 17, 2011, 2:21:19 AM2/17/11
to varid-c...@googlegroups.com
Try to google a tool called bamtools, use the bamtools random to extract the alignment contig by contig. then prepare fasta files of each contig and pass the contig bam file to varid by piping from samtools as shown in varid manual (supply separate fasta file corresponding to each contig)

This can make you spend some extra time but it is (quite sure) a very good way to achieve your result. I have noticed that in SHRIMP alignment, some sam headers are empty (ie they have no alignment so varid exits if it meets such contig). You will notice this while you use the random tool in bamtools.

good luck.

urchgene.

On Thu, Feb 17, 2011 at 4:17 AM, <varid-commu...@googlegroups.com> wrote:

Group: http://groups.google.com/group/varid-community/topics

 Topic: Running error
    Christian Probst <cmacp...@gmail.com> Feb 16 10:42AM -0800 ^
     
    Hi,
     
    I am using varid 1.0.7f and I have received the following error. I
    have a very small reference file, and the number of generated reads is
    very large, so I think that could be the cause.
     
    Thanks in advance and here is the code:
     
    varid_exec -r /solid/reference/Pool02.fasta -a
    Seq01_Pool002_SHRiMP.sam -o Seq01_Pool002_SNPs.pb --threads 4
    Using 4 threads
     
    - alignments: Seq01_Pool002_SHRiMP.sam
    - ref-file: /solid/reference/Pool02.fasta
     
    Detected:
    Pool02:40146 nt
    Splitting reads
    Total reads: 1124890
    Used reads: 1124890
    Done: in: 15.274 secs
    ------------------------------
    Processing fasta entry : Pool02 - 40146bp
    Processing group number : 1
    Loading reads ...
    Done in 14.582 secs
    Aligning reads ...
    There are 598 potential insert sites
    Done in: 5.698 secs
    Starting VARiD Algorithm
    Building Transition Matrix...
    Done in: 0.081 secs
    Trying to allocate 52967200 bytes...
    covmaps built in: 18.856 secs, Used reads in the covmaps:
    1124890/1124890, Skipped for 0 index:0
    Computing emissions...
    SNP penalty 1.000000e-03, log -6.907755e+00
    Done in: 47.723 secs,
    fwd algo done in: 40.533 secs
    bwd algo done in: 25.573 secs
    DONE VARiD Algorithm & prediction in: 137.784 secs
    Starting primary prediction... Done in: 14.769 secs
    <nuc2col.c:143> assertion: (outcol_tmp.data[i] != (VA_UINT8)-1)
    failed, Last errno=34
    <run_wrap.c:781> varid_exec exiting...
    <run_wrap.c:784> varid_exec exited

     


Misko Dzamba

unread,
Feb 17, 2011, 3:04:04 AM2/17/11
to varid-c...@googlegroups.com
Hi everyone,
Thanks Uche for the advice, it's nice to see VARiD users helping each
other! Many MFA entries do give some trouble (when there are hundreds
or more contigs being processed at once), but I think in this case
there might be only 1 contig of size 40146 is I am not mistaken?

I had a look at the code just now, and I think what is happening is
that there is a non 'ACGT' nucleotide somewhere in the given contig
with 0 coverage from the reads. What happens in this case is that
VARiD internally cannot make any call at this position because there
is no coverage, and therefore leaves the reference unchanged. This
causes problems with what looks like it might be some excess code in
predict.c, when VARiD tries to convert the letter space SNP
predictions back to colour space (which it should not really have to
do, since it has them already in colour space as well). It tries
convert through the non 'ACGT' nucleotide which is not currently
supported, it fails, and crashes out on a post condition assertion (
<nuc2col.c:143> assertion: (outcol_tmp.data[i] != (VA_UINT8)-1) ).

I did add in a quick hack (two lines in nuc2col.c) that *should* fix
this problem. We do have a new version we are looking to release soon,
that will have a better solution to this problem. Until then if you
would like I have posted the source at,
http://compbio.cs.utoronto.ca/varid/releases/varid-1.0.7f_mod.tar.gz
If you require binaries please let me know and I can compile them for
you using our intel c compiler suite. Also if you have any other
problems please let me know,

Misko

Christian Probst

unread,
Feb 17, 2011, 9:09:59 PM2/17/11
to varid-community
Hello, everyone.

Thanks about the tip, Urchgene, but Misko is right.

I have only one contig, sizing 40kb, which in fact is composed by
several amplicons (it is a targeted pool library from PCR fragments).
Separating each amplicon is a trait of 100 Ns nucleotides, so your
guess was right.

I will test the modified version.

Thanks everybody for the support.

Christian


On Feb 17, 6:04 am, Misko Dzamba <mouse9...@gmail.com> wrote:
> Hi everyone,
> Thanks Uche for the advice, it's nice to see VARiD users helping each
> other! Many MFA entries do give some trouble (when there are hundreds
> or more contigs being processed at once), but I think in this case
> there might be only 1 contig of size 40146 is I am not mistaken?
>
> I had a look at the code just now, and I think what is happening is
> that there is a non 'ACGT' nucleotide somewhere in the given contig
> with 0 coverage from the reads. What happens in this case is that
> VARiD internally cannot make any call at this position because there
> is no coverage, and therefore leaves the reference unchanged. This
> causes problems with what looks like it might be some excess code in
> predict.c, when VARiD tries to convert the letter space SNP
> predictions back to colour space (which it should not really have to
> do, since it has them already in colour space as well). It tries
> convert through the non 'ACGT' nucleotide which is not currently
> supported, it fails, and crashes out on a post condition assertion (
> <nuc2col.c:143> assertion: (outcol_tmp.data[i] != (VA_UINT8)-1) ).
>
> I did add in a quick hack (two lines in nuc2col.c) that *should* fix
> this problem. We do have a new version we are looking to release soon,
> that will have a better solution to this problem. Until then if you
> would like I have posted the source at,http://compbio.cs.utoronto.ca/varid/releases/varid-1.0.7f_mod.tar.gz
> If you require binaries please let me know and I can compile them for
> you using our intel c compiler suite. Also if you have any other
> problems please let me know,
>
> Misko
>
>
>
> On Thu, Feb 17, 2011 at 2:21 AM, uche okeke <urchg...@gmail.com> wrote:
> > Try to google a tool called bamtools, use the bamtools random to extract the
> > alignment contig by contig. then prepare fasta files of each contig and pass
> > the contig bam file to varid by piping from samtools as shown in varid
> > manual (supply separate fasta file corresponding to each contig)
>
> > This can make you spend some extra time but it is (quite sure) a very good
> > way to achieve your result. I have noticed that in SHRIMP alignment, some
> > sam headers are empty (ie they have no alignment so varid exits if it meets
> > such contig). You will notice this while you use the random tool in
> > bamtools.
>
> > good luck.
>
> > urchgene.
>
> > On Thu, Feb 17, 2011 at 4:17 AM, <varid-commu...@googlegroups.com>
> > wrote:
>
> >>   Today's Topic Summary
>
> >> Group:http://groups.google.com/group/varid-community/topics
>
> >> Running error [1 Update]
>
> >>  Topic: Running error
>
> >> Christian Probst <cmacpro...@gmail.com> Feb 16 10:42AM -0800 ^
Reply all
Reply to author
Forward
0 new messages