sequenza.fit and sequenza.result for female

800 views
Skip to first unread message

YunSuhk Suh

unread,
Mar 7, 2019, 9:33:59 PM3/7/19
to Sequenza User Group
Hello Francesco,

I am really enjoying 'sequenza' to call somatic CN and appreciate your contribution.


During sequenza.fit , I got an error like below.


```
> test <- sequenza.fit(seqzdata, chromosome.list = c(1:22, "X", "Y"), female = FALSE)

Error in result$LPP - max.lik : non-numeric argument to binary operator

In addition: There were 50 or more warnings (use warnings() to see the first 50)

> warnings()

Warning messages:

1: In parallel::mclapply(X[Split[[i]]], FUN, ..., mc.cores = as.integer(cl)) :

all scheduled cores encountered errors in user code

2: In parallel::mclapply(X[Split[[i]]], FUN, ..., mc.cores = as.integer(cl)) :

all scheduled cores encountered errors in user code
.
.
.
```

If I converted female=TRUE, there was no error, but this sample came from male.

what was wrong with my code?



YunSuhk Suh

unread,
Mar 8, 2019, 8:12:21 PM3/8/19
to Sequenza User Group
Hello again,

I change the option as
chromosome list=c(1:22)


the error in result$LPP was gone but still other error messages were remained as below.


There were 50 or more warnings (use warnings() to see the first 50)

> warnings()

Warning messages:

1: In read_tokens_(data, tokenizer, col_specs, col_names, ... :

 length of NULL cannot be changed

2: In read_tokens_(data, tokenizer, col_specs, col_names, ... :

 length of NULL cannot be changed

Could you point out the problem?



2019년 3월 7일 목요일 오후 9시 33분 59초 UTC-5, YunSuhk Suh 님의 말:

Joseph Fass

unread,
Mar 27, 2019, 2:54:47 PM3/27/19
to Sequenza User Group
I'm getting this behavior as well, on a seqz file that only contains chromosome 6. This doesn't change if I the 'female' option to sequenza.fit, and it doesn't change if I look in the BAF$`6` data frame and change NAs to zeros.

Joseph Fass

unread,
Mar 27, 2019, 3:12:44 PM3/27/19
to Sequenza User Group
Hmmm ... I'm not sure I'm having the same problem. Most of my warnings were:

2: In parallel::mclapply(X[Split[[i]]], FUN, ..., mc.cores = as.integer(cl)) :
  all scheduled cores encountered errors in user code

... but actually the first one is:

1: In max(segs.all$sd.BAF, na.rm = TRUE) :
  no non-missing arguments to max; returning -Inf

So I looked in the segments element and:

> head( MN.all$segments$`1` )
  chromosome start.pos end.pos Bf N.BAF sd.BAF depth.ratio N.ratio   sd.ratio
1          1     12753   13818 NA     0     NA   2.1151503     839 1.74910077
2          1     14468   14491 NA     0     NA   1.9589683      23 0.15333392
3          1     14492   14504 NA     0     NA   1.3124848      12 0.14306091
4          1     14505   14516 NA     0     NA   1.1453861      11 0.06399233
5          1     14517   14547 NA     0     NA   0.9646945      30 0.04936544
6          1     14548   14589 NA     0     NA   1.0583347      41 0.04394441

Bf and sd.BAF are all NAs, for every chromosome. Could this be due to a change between version 2.something and 3?

~Joe

Joseph Fass

unread,
Mar 28, 2019, 4:08:58 PM3/28/19
to Sequenza User Group
OK so I may have hijacked this thread, but I thought I'd try to resolve this here in case anybody finds this.

So when I've run bam2seqz using --parallel, I've seen seqz files that look like this:

chromosome      position        base.ref        depth.normal    depth.tumor     depth.ratio     Af      Bf      zygosity.normal GC.percent good.reads       AB.normal       AB.tumor        tumor.strand
1       12759   C       3       18      6.0     1.0     0       om      72      18      C       .       0
1       12760   C       2       19      9.5     1.0     0       om      72      19      C       .       0
1       12762   C       2       20      10.0    1.0     0       om      72      20      C       .       0
1       12763   T       3       20      6.667   1.0     0       om      72      20      T       .       0
1       12764   A       3       20      6.667   1.0     0       om      72      20      A       .       0

Column 8 (Bf) is *always* zero, and column 9 (zygosity.normal) is a mix of "hom" and "om" (!).
I just started a run with only one chromosome and not using --parallel, and already I see this:

chromosome      position        base.ref        depth.normal    depth.tumor     depth.ratio     Af      Bf      zygosity.normal GC.percent good.reads       AB.normal       AB.tumor        tumor.strand
chr6    292268  G       5       15      3.0     1.0     0       hom     72      15      G       .       0
chr6    292269  G       6       14      2.333   1.0     0       hom     72      14      G       .       0
chr6    292270  A       7       14      2.0     1.0     0       hom     72      14      A       .       0
chr6    292271  C       7       15      2.143   1.0     0       hom     72      15      C       .       0
chr6    292272  C       8       16      2.0     1.0     0       hom     72      16      C       .       0
chr6    292273  A       8       16      2.0     1.0     0       hom     72      16      A       .       0
chr6    292274  T       9       15      1.667   1.0     0       hom     72      15      T       .       0

Well, that doesn't show column 8, but it *does* contain a range of values from 0 to 0.5, and column 9 contains a mix of "hom" and "het". So maybe --parallel is the culprit? Because I know I've run on the whole genome before and gotten usable data.

More when I get it ...

~Joe

Joseph Fass

unread,
Apr 1, 2019, 11:41:52 AM4/1/19
to Sequenza User Group
Yeesh. As so often happens, I did this to myself. I was creating those 'om' lines with some faulty post-transformation steps. Disregard my posts above.
~Joe

Francesco Favero

unread,
Apr 1, 2019, 11:54:18 AM4/1/19
to Joseph Fass, Sequenza User Group
:D, I was starting to worry I had to revisit all the parse.

I'm super busy so I still haven't had time to try to replicate Yun-Sunk error.

Best

Francesco


--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shixiang Wang

unread,
Dec 2, 2019, 1:36:19 AM12/2/19
to Sequenza User Group
I have a similar issue, is there a way to fix? It makes about 100 samples failed by sequenza.

Collecting GC information . done


Processing chr1:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   3 homozygous positions.

Processing chr2:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chr3:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chr5:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chr6:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chr7:

   1 variant calls.

   0 copy-number segments.

   1 heterozygous positions.

   3 homozygous positions.

Processing chr8:

   1 variant calls.

   1 copy-number segments.

   0 heterozygous positions.

   2 homozygous positions.

Processing chr11:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chr12:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   9 homozygous positions.

Processing chr14:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chr16:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   2 homozygous positions.

Processing chr17:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chr19:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   7 homozygous positions.

Processing chr20:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   1 homozygous positions.

Processing chrX:

   1 variant calls.

   0 copy-number segments.

   0 heterozygous positions.

   2 homozygous positions.

Processing chrY:

   1 variant calls.

   1 copy-number segments.

   0 heterozygous positions.

   3 homozygous positions.

Warning messages:

1: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

2: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

3: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

4: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

5: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

6: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

7: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

8: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

9: In b[which(diff(b) == 0) + 1] <- b[diff(b) == 0] + offset :

  number of items to replace is not a multiple of replacement length

Error in result$LPP - max.lik : non-numeric argument to binary operator

Calls: sequenza.fit -> baf.model.fit

In addition: There were 50 or more warnings (use warnings() to see the first 50)

Execution halted

zeynep kaçar

unread,
Dec 6, 2019, 3:13:59 PM12/6/19
to Sequenza User Group
Hello,

Did you find a solution for this problem ? Thanks!

Zeynep

Francesco Favero

unread,
Dec 6, 2019, 4:16:13 PM12/6/19
to Sequenza User Group
Hi Shixiang,

sorry for the late reply.

The log tells me that there are no heterozygous positions and very few homozygous position for each chromosome (1 or 2) to calculate segments.
All in all it seems a very sparse data. Is that from a panel sequencing or some ad-hoc filtered data?
The best scenario would be to have whole exome or whole genome information, so to calculate allele frequency and depth-ratio across multiple SNPs and variants for each segments.

If the data is from whole genome/exome then the pre-processing step is doing something funny. How did you generated the seqz files?

Best

Francesco

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.

zeynep kaçar

unread,
Dec 6, 2019, 5:14:30 PM12/6/19
to Sequenza User Group

Hello Francesco,

I am pretty new in this area! and I try to use Sequenza to get copy number and purity inputs for PyClone software. I am also getting the same warnings with an extra error message regarding chromosome M. I used VCFs file to generate seqz files and I have WES data. So far I tried in 4 patients and got similar warnings and errors. Could you please help me with this! Thanks!!
Here is my output:

Collecting GC information . done

Processing chr1:
   1 variant calls.
   0 copy-number segments.
   22 heterozygous positions.
   36 homozygous positions.
Processing chr2:
   1 variant calls.
   0 copy-number segments.
   22 heterozygous positions.
   31 homozygous positions.
Processing chr3:
   1 variant calls.
   0 copy-number segments.
   12 heterozygous positions.
   14 homozygous positions.
Processing chr4:
   1 variant calls.
   0 copy-number segments.
   9 heterozygous positions.
   16 homozygous positions.
Processing chr5:
   1 variant calls.
   0 copy-number segments.
   9 heterozygous positions.
   11 homozygous positions.
Processing chr6:
   1 variant calls.
   0 copy-number segments.
   10 heterozygous positions.
   22 homozygous positions.
Processing chr7:
   1 variant calls.
   0 copy-number segments.
   8 heterozygous positions.
   11 homozygous positions.
Processing chr8:
   1 variant calls.
   0 copy-number segments.
   9 heterozygous positions.
   18 homozygous positions.
Processing chr9:
   1 variant calls.
   0 copy-number segments.
   15 heterozygous positions.
   17 homozygous positions.
Processing chr10:
   1 variant calls.
   0 copy-number segments.
   9 heterozygous positions.
   11 homozygous positions.
Processing chr11:
   1 variant calls.
   0 copy-number segments.
   8 heterozygous positions.
   23 homozygous positions.
Processing chr12:
   1 variant calls.
   0 copy-number segments.
   13 heterozygous positions.
   23 homozygous positions.
Processing chr13:
   1 variant calls.
   0 copy-number segments.
   6 heterozygous positions.
   3 homozygous positions.
Processing chr14:
   1 variant calls.
   0 copy-number segments.
   7 heterozygous positions.
   11 homozygous positions.
Processing chr15:
   1 variant calls.
   0 copy-number segments.
   7 heterozygous positions.
   18 homozygous positions.
Processing chr16:
   1 variant calls.
   0 copy-number segments.
   7 heterozygous positions.
   32 homozygous positions.
Processing chr17:
   1 variant calls.
   0 copy-number segments.
   7 heterozygous positions.
   21 homozygous positions.
Processing chr18:
   1 variant calls.
   0 copy-number segments.
   1 heterozygous positions.
   6 homozygous positions.
Processing chr19:
   1 variant calls.
   1 copy-number segments.
   8 heterozygous positions.
   25 homozygous positions.
Processing chr20:
   1 variant calls.
   0 copy-number segments.
   8 heterozygous positions.
   8 homozygous positions.
Processing chr21:
   1 variant calls.
   0 copy-number segments.
   3 heterozygous positions.
   8 homozygous positions.
Processing chr22:
   1 variant calls.
   0 copy-number segments.
   7 heterozygous positions.
   11 homozygous positions.
Processing chrX:
   1 variant calls.
   0 copy-number segments.
   10 heterozygous positions.
   14 homozygous positions.
Processing chrY:
   1 variant calls.
   0 copy-number segments.
   1 heterozygous positions.
   1 homozygous positions.
Processing chrM:
Error: 'chrM 9027 C 21 9 0.429 1.0 0 hom 52 9 C . 0' does not exist in current working directory ('/Users/kacarz2').
In addition: Warning messages:

Francesco Favero

unread,
Dec 6, 2019, 5:48:17 PM12/6/19
to zeynep kaçar, Sequenza User Group
Hi Zeynep,

can you tell me how you generated the input, which VCF file have you used?
You should have germline calls mixed with somatic calls. Ideally a very raw file
Eg mutec/mutect2 is not optimal as it calls only the somatic.
If in doubt and if you can, you can start from the bam files. This should give you the right approach.

Also don’t include the M chromosome, use only 1:22,X,Y chromosome.

Best

Francesco

--
You received this message because you are subscribed to the Google Groups "Sequenza User Group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-g...@googlegroups.com.

zeynep kaçar

unread,
Dec 6, 2019, 8:58:16 PM12/6/19
to Sequenza User Group
Ohh you are right, I was using Raw Simple Somatic Mutation with Muse workflow. So using those VCFs files may be not a good way. 
Thank you very much for your quick response! I will try to start from bam files. Hopefully, this time it will work.

Best,

Zeynep
Francesco

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

Shixiang Wang

unread,
Dec 7, 2019, 10:04:47 AM12/7/19
to Sequenza User Group
Dear Francesco,

Thanks for the reply. All my raw data are WES and they come from several dbGap studies. 
There are about 1000 samples and I used the default parameters and standard pipeline to process them all from bam -> seqz -> small-seqz -> results (I have attached the template code).
I found about 50 samples got the same error as shown above.

The small-seqz file sizes are different, so I think the samples may have different sequencing depth, 
but I don't know how to calculate exome sequencing depth for these samples and go further to check this.

I am wondering if my method is right for processing all 1000 samples using the same pipeline and there is a solution to check fix the error above?

Should I just drop the samples with error and go to the next analysis or should I change my processing pipeline? The sequenza result is the key data for further analysis.


Best,
Shixiang


snapshot.PNG




在 2019年12月7日星期六 UTC+8上午5:16:13,Francesco Favero写道:
Hi Shixiang,
Francesco

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.

zeynep kaçar

unread,
Dec 7, 2019, 12:39:37 PM12/7/19
to Sequenza User Group

Hi Francesco,

I am now running tumor-normal bam files for one patient, but it is already 1 hour and 30 minutes and it is still running (I mean the code for sequenza-utils to generate seqz file). I wonder if it is normal or something is wrong? Thank you very much again being so responsive and helpful!

Best,

Zeynep
Francesco

To unsubscribe from this group and stop receiving emails from it, send an email to sequenza-user-group+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages