sequenza.extract error

270 views
Skip to first unread message

Dietmar Rieder

unread,
May 8, 2017, 10:08:44 AM5/8/17
to Sequenza User Group
Hi,

I'm new to sequenza and try to evaluate the tool.
Unfortunately I have troubles to use sequenza.extract on a windowed seqz file. The error is:

test2 <- sequenza.extract("xxxxxx_small.seqz.gz")
Processing chr1: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  :  
 scan() expected 'an integer', got '15.3921568627'


the first lines in xxxxxx_small.seqz.gz are:

chromosome      position        base.ref        depth.normal    depth.tumor     depth.ratio     Af      Bf      zygosity.normal GC.percent      good.reads      AB.normal       AB.tumor        tumor strand
chr1    12928   N       15.3921568627   17.0    1.084   1.0     0       hom     63      51      N       .       0
chr1    12947   C       13      14      1.077   0.929   0       hom     63      14      C       A0.071  A1.0


xxxxxx_small.seqz.gz was created using

sequenza-utils bam2seqz [...] --chromosome chr1 xxxxxx
sequenza-utils seqz_binning -w 50 -s xxxxxx | pigz -p 12 > xxxxxx_small.seqz.gz


Is there anything I'm doing wrong?
I'm using sequenza-utils downloaded from git on May 5th 2017.


I should note that I had to apply the following change to wig.py in order to be able to run bam2seqz:

--- wig.py      2017-04-27 10:48:48.000000000 +0200
+++ /usr/local/bioinf/python/python-2.7.13/lib/python2.7/site-packages/sequenza_utils-2.2.0-py2.7-linux-x86_64.egg/sequenza/wig.py      2017-05-08 10:12:12.000000000 +0200
@@ -27,7 +27,7 @@
            self._end = self._start + self._span
            return ((self._chromosome, self._start, self._end), (self._val,))
        except ValueError:
-            line, chrom, span = wig_line.strip().split()
+            line, chrom, span = wig_line.strip().split(None, 2)
            self._chromosome = chrom.split('chrom=')[1]
            self._span = int(span.split('span=')[1])
            pos, self._val = next(self._wig).strip().split('\t', 1)


It seems that the wig header line created from my bams has more than 3 fields.

Thanks
  Dietmar

Francesco Favero

unread,
May 9, 2017, 8:07:48 AM5/9/17
to Dietmar Rieder, Sequenza User Group
Hi Dietmar,

I don’t think you are doing anything wrong, it seems a bug in the new binning function.
The average depth normal and tumor should be rounded to integers, otherwise the R package will get the error you encountered.


with something like:

avg_line = [self._last_chromosome, self._last_position, 'N', int(round(self._n_depth / self._n, 0)),
                    int(round(self._t_depth / self._n, 0)), round(self._ratio / self._n, 3),
                    1.0, 0, 'hom’, gc, self._n, 'N', '.', '0']

I’ve push the changes in the git repo if you don’t want to modify it yourself (I haven’t changed the wig.py code yet, so you could merge your branch or just copy the seqz.py file….).

Thanks a lot for reporting the issue and I’ll look closely into the wig issue as well: did you use a GC wig downloaded from UCSC? I might just take your fix for now.

Thanks again

Francesco

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dietmar Rieder

unread,
May 9, 2017, 8:29:42 AM5/9/17
to Sequenza User Group, dietmar...@gmail.com
Hi Francesco,

thanks for your answer. I'll try your fix.

Regarding the GC wig:
I created it on my own. The first line look like this:

variableStep chrom=chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 span=50

The command was:
sequenza-utils GC-windows -w 50 GRCh38.d1.vd1/GRCh38.d1.vd1.fa | pigz -p 8 > GRCh38.d1.vd1.gc50Base.txt.gz

I guess the GC-windows util is using the entire fasta header lines, which in my case is:

>chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38



HTH
  Dietmar

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

Francesco Favero

unread,
May 9, 2017, 8:33:53 AM5/9/17
to Dietmar Rieder, Sequenza User Group
Yes, I remember I had the same problem in the past :).

I’ll fix both the wig.py and the GC-window function.
Thanks a lot for all the feedback!


Francesco



To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.

Dietmar Rieder

unread,
May 9, 2017, 9:01:36 AM5/9/17
to Sequenza User Group, dietmar...@gmail.com
Hi,

your fix for the binning function works. Thanks

Dietmar

Francesco Favero

unread,
May 9, 2017, 9:04:43 AM5/9/17
to Dietmar Rieder, Sequenza User Group
I’ve also modified the fasta.py source, this should fix your GC file


Best


To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages