Read block operation error in variant calling

1,270 views
Skip to first unread message

ni...@cc-tdi.org

unread,
Aug 2, 2017, 7:49:42 PM8/2/17
to Platypus Users

 Hi,

I'm running Platypus on a set of matched normal and cancer samples. In trying to call variants, the terminal fills with the following error at (almost) every location. The exceptions are X and Y chromosomes. And even then, it fails on a few regions of X. I've copied the bottom of the error set below, so you can see the contrast between these chromosomes.

2017-08-02 16:47:53,520 - INFO - Processing region X:11900000-12000000. (Only printing this message every 10 regions of size 100000)
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
2017-08-02 16:47:53,616 - INFO - Processing region X:12900000-13000000. (Only printing this message every 10 regions of size 100000)
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
2017-08-02 16:47:53,734 - INFO - Processing region X:13900000-14000000. (Only printing this message every 10 regions of size 100000)
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
2017-08-02 16:47:53,798 - INFO - Processing region X:14900000-15000000. (Only printing this message every 10 regions of size 100000)
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
2017-08-02 16:47:53,945 - INFO - Processing region X:15900000-16000000. (Only printing this message every 10 regions of size 100000)
2017-08-02 16:47:54,026 - INFO - Processing region X:16900000-17000000. (Only printing this message every 10 regions of size 100000)
2017-08-02 16:47:54,110 - INFO - Processing region X:17900000-18000000. (Only printing this message every 10 regions of size 100000)

Andy Rimmer

unread,
Aug 3, 2017, 3:51:12 AM8/3/17
to ni...@cc-tdi.org, Platypus Users
Hi Nick,

I don't think I've seen that one before. The error is coming from htslib, which is failing to read the BAMs for some reason. This could be because the files or the indexes are corrupted, or it could be a permissions problem. Another possibility is that you are trying to read too many files at once. How may matched tumour / normals are in the set?

Kind regards,
Andy


--
You received this message because you are subscribed to the Google Groups "Platypus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platypus-users+unsubscribe@googlegroups.com.
To post to this group, send email to platypus-users@googlegroups.com.
Visit this group at https://groups.google.com/group/platypus-users.
To view this discussion on the web, visit https://groups.google.com/d/msgid/platypus-users/d3be9ce0-4a26-4118-814d-134d5d08ee56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dr Andrew (Andy) Rimmer

ni...@cc-tdi.org

unread,
Aug 3, 2017, 11:33:58 AM8/3/17
to Platypus Users, ni...@cc-tdi.org
In this testing set, I'm just doing one normal and one tumor. If possible, I would like to scale this up, though, to one tumor and say 40 normals, as per GATK's recommendation for a panel of normals.
It looks like it could be a problem with the BAM index.

Here's my command and the start of the error
python /abs/path/to/Platypus_0.8.1/Platypus.py callVariants -o indels_matched.vcf --bamFiles=/abs/path/to/tumordna.bam,/abs/path/to/normaldna.bam --logFileName=platypus_trouble.log --refFile=/abs/path/to/hg38.fa --genIndels=1 --genSNPs=0
2017-08-03 08:18:16,496 - INFO - Beginning variant calling
2017-08-03 08:18:16,496 - INFO - Output will go to normaldna.vcf
[W::hts_idx_load2] The index file is older than the data file: /abs/path/to/tumordna.bai
[W::hts_idx_load2] The index file is older than the data file: /abs/path/to/tumordna.bai
2017-08-03 08:18:17,157 - INFO - Processing region 1:0-100000. (Only printing this message every 10 regions of size 100000)

[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
2017-08-03 08:18:17,441 - INFO - Processing region 1:1000000-1100000. (Only printing this message every 10 regions of size 100000)

[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes
2017-08-03 08:18:18,102 - INFO - Processing region 1:2000000-2100000. (Only printing this message every 10 regions of size 100000)

[E::bgzf_read] Read block operation failed with error -1 after 0 of 4 bytes

However, I'm not convinced; running samtools idxstats churns out perfectly normal statistics on both despite their same warning;
samtools idxstats tumordna.bam
Warning: The index file is older than the data file: tumordna.bai
1       248956422       11392213        7879
2       242193529       8628419 5913
3       198295559       9399751 6165
4       190214555       4886955 3337
5       181538259       5489974 3893
6       170805979       6026849 4186
7       159345973       5836764 4164
8       145138636       4151537 3004
9       138394717       4802964 3449
10      133797422       4713127 3375
11      135086622       6878823 4761
12      133275309       5923137 4352
13      114364328       2254480 1557
14      107043718       4217733 2737
15      101991189       3937463 2725
16      90338345        3639656 2746
17      83257441        5696854 4281
18      80373285        1800628 1247
19      58617616        7705012 6037
20      64444167        2690292 2064
21      46709983        1539867 1191
22      50818468        1503402 1285
X       156040895       2423614 1809
Y       57227415        49893   182
MT      16569   76847   458
*       0       0       7620
To unsubscribe from this group and stop receiving emails from it, send an email to platypus-user...@googlegroups.com.
To post to this group, send email to platypu...@googlegroups.com.

Andy Rimmer

unread,
Aug 3, 2017, 11:53:45 AM8/3/17
to ni...@cc-tdi.org, Platypus Users
As far as I can see the timestamp shouldn't matter, as long as the index file is valid. But I'm not that familiar with the htslib code, so maybe try re-indexing them anyway? Also, check that there aren't any other index files lying around in the same directories (i.e. the .bam.bai or .csi files that htslib might use).

I think (not 100% sure) that samtools idxstats just looks at the index file, and doesn't query the actual BAM, so it may not show up any problems that are due to the BAM and index being out of sync.

Kind regards,
Andy
 

To unsubscribe from this group and stop receiving emails from it, send an email to platypus-users+unsubscribe@googlegroups.com.
To post to this group, send email to platypus-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

ni...@cc-tdi.org

unread,
Aug 3, 2017, 12:42:43 PM8/3/17
to Platypus Users, ni...@cc-tdi.org
Reindexing the BAM worked! I have a couple more questions:
1, by setting the --genIndels=1 and --genSNPs=0, am I only calling Indels?
2, unless I set the posterior probability in somaticMutationDetector.py to 0, I'm not getting any indels. If I filter out the low quality calls using the FILTER column of the VCF, would this still work to call the somatic mutations between cancer and normal samples?

Andy Rimmer

unread,
Aug 7, 2017, 9:54:24 AM8/7/17
to ni...@cc-tdi.org, Platypus Users
Hi Nick,

1. Yes, this will turn off SNP detection in Platypus. In some cases this may affect the quality of the indel calls, if there are indels close to SNPs, but this should be a small effect.
2. The script treats SNPs and indels in the same way, so I'm not sure why you don't see any. It could be that there aren't many real somatic indels in your samples. If you do as you are suggesting, and filter out low quality calls but set the posterior to 0, you should be able to see if there are any good candidates.

Kind regards,
Andy



To unsubscribe from this group and stop receiving emails from it, send an email to platypus-users+unsubscribe@googlegroups.com.
To post to this group, send email to platypus-users@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages