FAIL_IO: Failed to read the record size

Mario Saare

unread,

Mar 29, 2017, 8:55:03 AM3/29/17

to verifyBamID

Dear all,

I ran the verifyBamID with with the options 'verifyBamID --vcf "$vcfFile" --bam "$input" --out ${output}/${prefix} --verbose --ignoreRG' and got the error message given below:

Finished reading 1185737 markers from VCF file
Total of 596232 informative markers passed after AF >= 0.010000 and callRate >= 0.500000 threshold
Reading BAM file Sample1.Aligned.sortedByCoord.out.bam
Exiting due to ERROR:
FAIL_IO: Failed to read the record size, Sample1.Aligned.sortedByCoord.out.bam.

It seems that the VCF file was read in correctly, but the BAM file from an RNA-seq experiment gave an error. I googled the error message, but I didn't find what it refers to. Has anyone else gotten such a message and how to solve it? Is it related to the BAM file, or for example, to the lack of computational resources (RAM) to properly process the data?

I ran it on a computing cluster with 1 CPU and 30 GB RAM.
The VCF file was filtered beforehand: only PASS calls, AC > 0, low complexity regions removed, genotype call rate > 0.9 and Hardy-Weinberg equilibrium p-value > 1e-9.
The BAM file was created with STAR aligner ( --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopassMode Basic), contains the SM tag in the @RG field, is coordinate sorted and indexed. However, the duplicates have been removed and no base quality recalibration has not been done meaning that it does not exactly match the requirements listed in the verifyBamID wiki. Can this cause the error that I have? I would appreciate any feedback! Thanks in advance!

Best regards,
Mario Saare

Mary Kate Wing

unread,

Apr 6, 2017, 11:57:04 PM4/6/17

to verif...@googlegroups.com

You are correct, it does appear that the BAM file is generating that error.

Each BAM record should start with a 32 bit block size. The error message you see is that VerifyBamID tried to read the block size of a BAM file, but it failed to read all 32bits, but did not hit the end of the file. So basically it means that the read failed to read enough bytes for some reason.

Duplicate removal and base quality recalibration should not affect this error message.

To sanity check that the library can read the bam file on its own, I'd try running: bam validate.

http://genome.sph.umich.edu/wiki/BamUtil:_validate

./bam validate --in Sample1.Aligned.sortedByCoord.out.bam --so_coord --verbose

Let me know if the validation succeeds. It could be a lack of resources issue that is causing the file read to fail, but I think it is worth verifying that our tools can successfully read just the BAM file on its own.

Mary Kate Wing

--
You received this message because you are subscribed to the Google Groups "verifyBamID" group.
To unsubscribe from this group and stop receiving emails from it, send an email to verifybamid+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

huilei xu

unread,

Aug 28, 2018, 9:36:46 AM8/28/18

to verifyBamID

I got the results below from bam validate (verifybam failed to complete running for the bam file). What does fail_io mean? Is there a way to fix it? Thanks.

Record 96346444:

FAIL_IO: Failed to read the record, xxx.bam.

Number of records read = 96346444

Number of valid records = 96346444

TotalReads(e6) 96.35

MappedReads(e6) 96.30

PairedReads(e6) 96.35

ProperPair(e6) 94.95

DuplicateReads(e6) 24.48

QCFailureReads(e6) 0.00

MappingRate(%) 99.96

PairedReads(%) 100.00

ProperPair(%) 98.55

DupRate(%) 25.41

QCFailRate(%) 0.00

TotalBases(e6) 14424.15

BasesInMappedReads(e6) 14417.66

Returning: 3 (FAIL_IO)

To unsubscribe from this group and stop receiving emails from it, send an email to verifybamid...@googlegroups.com.

Hyun Min Kang

unread,

Aug 28, 2018, 9:55:26 AM8/28/18

to verif...@googlegroups.com

This means that the BAM file failed to read, so it's more like a disk problem.

Hyun.

Reply all

Reply to author

Forward