sequenza.extract error

Dietmar Rieder

unread,

May 8, 2017, 10:08:44 AM5/8/17

to Sequenza User Group

Hi,

I'm new to sequenza and try to evaluate the tool.

Unfortunately I have troubles to use sequenza.extract on a windowed seqz file. The error is:

test2 <- sequenza.extract("xxxxxx_small.seqz.gz")
Processing chr1: Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
scan() expected 'an integer', got '15.3921568627'

the first lines in xxxxxx_small.seqz.gz are:

chromosome      position        base.ref        depth.normal    depth.tumor     depth.ratio     Af      Bf      zygosity.normal GC.percent      good.reads      AB.normal       AB.tumor        tumor strand
chr1    12928   N       15.3921568627   17.0    1.084   1.0     0       hom     63      51      N       .       0
chr1    12947   C       13      14      1.077   0.929   0       hom     63      14      C       A0.071 A1.0

xxxxxx_small.seqz.gz was created using

sequenza-utils bam2seqz [...] --chromosome chr1 > xxxxxx

sequenza-utils seqz_binning -w 50 -s xxxxxx | pigz -p 12 > xxxxxx_small.seqz.gz

Is there anything I'm doing wrong?

I'm using sequenza-utils downloaded from git on May 5th 2017.

I should note that I had to apply the following change to wig.py in order to be able to run bam2seqz:

--- wig.py      2017-04-27 10:48:48.000000000 +0200
+++ /usr/local/bioinf/python/python-2.7.13/lib/python2.7/site-packages/sequenza_utils-2.2.0-py2.7-linux-x86_64.egg/sequenza/wig.py      2017-05-08 10:12:12.000000000 +0200
@@ -27,7 +27,7 @@
            self._end = self._start + self._span
            return ((self._chromosome, self._start, self._end), (self._val,))
        except ValueError:
-            line, chrom, span = wig_line.strip().split()
+            line, chrom, span = wig_line.strip().split(None, 2)
            self._chromosome = chrom.split('chrom=')[1]
            self._span = int(span.split('span=')[1])
            pos, self._val = next(self._wig).strip().split('\t', 1)

It seems that the wig header line created from my bams has more than 3 fields.

Thanks

Dietmar

Francesco Favero

unread,

May 9, 2017, 8:07:48 AM5/9/17

to Dietmar Rieder, Sequenza User Group

Hi Dietmar,

I don’t think you are doing anything wrong, it seems a bug in the new binning function.

The average depth normal and tumor should be rounded to integers, otherwise the R package will get the error you encountered.

Can you try to modify here: https://bitbucket.org/sequenza_tools/sequenza-utils/src/1a7736cf4bfbc02937ceecdc9598d7d990b2bd43/sequenza/seqz.py?at=master&fileviewer=file-view-default#seqz.py-311

with something like:

avg_line = [self._last_chromosome, self._last_position, 'N', int(round(self._n_depth / self._n, 0)),
                    int(round(self._t_depth / self._n, 0)), round(self._ratio / self._n, 3),

                    1.0, 0, 'hom’, gc, self._n, 'N', '.', '0']

I’ve push the changes in the git repo if you don’t want to modify it yourself (I haven’t changed the wig.py code yet, so you could merge your branch or just copy the seqz.py file….).

Thanks a lot for reporting the issue and I’ll look closely into the wig issue as well: did you use a GC wig downloaded from UCSC? I might just take your fix for now.

Thanks again

Francesco

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Dietmar Rieder

unread,

May 9, 2017, 8:29:42 AM5/9/17

to Sequenza User Group, dietmar...@gmail.com

Hi Francesco,

thanks for your answer. I'll try your fix.

Regarding the GC wig:

I created it on my own. The first line look like this:

variableStep chrom=chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 span=50

The command was:

sequenza-utils GC-windows -w 50 GRCh38.d1.vd1/GRCh38.d1.vd1.fa | pigz -p 8 > GRCh38.d1.vd1.gc50Base.txt.gz

I guess the GC-windows util is using the entire fasta header lines, which in my case is:

>chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38

HTH

Dietmar

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

Francesco Favero

unread,

May 9, 2017, 8:33:53 AM5/9/17

to Dietmar Rieder, Sequenza User Group

Yes, I remember I had the same problem in the past :).

I’ll fix both the wig.py and the GC-window function.

Thanks a lot for all the feedback!

Francesco

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.

Dietmar Rieder

unread,

May 9, 2017, 9:01:36 AM5/9/17

to Sequenza User Group, dietmar...@gmail.com

Hi,

your fix for the binning function works. Thanks

Dietmar

Francesco Favero

unread,

May 9, 2017, 9:04:43 AM5/9/17

to Dietmar Rieder, Sequenza User Group

I’ve also modified the fasta.py source, this should fix your GC file

Best

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.

Reply all

Reply to author

Forward