vcfeval: Error: An IO problem occurred: "Expected file missing"

42 views
Skip to first unread message

Charles Warden

unread,
Aug 14, 2023, 7:42:14 PM8/14/23
to RTG Users
Hi,

I am test vcfeval based upon some feedback from this Biostars discussion.

To start, I am testing an example of VCF compared to itself (since the discussion related to such a situation and whether the calculated F1 value was equal to 1).

The specified output folder was created, and I have attached a copy of the vcfeval.txt (from within that folder).

The output captured from the command was "Error: An IO problem occurred: "Expected file missing".

The command being run is as follows:

TEST=../../GIAB-hg38_latest-230726/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
REF=../../GIAB-hg38_latest-230726/HG002_GRCh38_1_22_v4.2.1_benchmark.vcf.gz
OUT=CALLABLE-GAIB_v4.2.1-vs-GAIB_v4.2.1
SDF=../../../EPI2ME/giab_lsk114_2022.12/copied_files/analysis/benchmarking/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna.sdf
BED=../../GIAB-hg38_latest-230726/SupplementaryFiles/HG002_GRCh38_1_22_v4.2.1_callablemultinter_gt0.bed
THREADS=4

/path/to/rtg-tools-3.12.1/rtg vcfeval -T $THREADS -Z --bed-regions=$BED -b $REF -c $TEST -o $OUT -t $SDF

Can you please help me determine what file or files are not being recognized (or what other issue is occurring)?

With "less" or "ls" commands, I believe the file paths should all be correct.

Thank you very much!

Sincerely,
Charles
vcfeval.log

Len Trigg

unread,
Aug 14, 2023, 9:36:50 PM8/14/23
to Charles Warden, RTG Users
Hi Charles,

That error is produced when there is some corruption in the SDF (in particular one of the files within the SDF directory is missing). I would suggest using rtg format to re-create the reference SDF from your original fasta.

Cheers,
Len.


--
You received this message because you are subscribed to the Google Groups "RTG Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rtg-users+...@realtimegenomics.com.
To view this discussion on the web visit https://groups.google.com/a/realtimegenomics.com/d/msgid/rtg-users/35ae4096-b520-4da0-a3f5-88bb9d9956c3n%40realtimegenomics.com.

Charles Warden

unread,
Aug 15, 2023, 1:17:17 PM8/15/23
to RTG Users, Len Trigg, RTG Users, Charles Warden
Hi Len,

Thank you very much for your prompt response.

I was using a .sdf file/folder downloaded from Nanopore (for this dataset).

I have gone back to create a file from the original reference sequence file using the following command:

FA=../../../EPI2ME/giab_lsk114_2022.12/copied_files/analysis/benchmarking/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
SDF=GCA_000001405.15_GRCh38_no_alt_analysis_set.sdf

/path/to/rtg format -o $SDF $FA


I then used this .sdf file for running rtg vcfeval, instead of the provided .sdf file.

This solves that particular error message - so, thank you!

I believe this is the main output summary when that VCF is compared to itself:

Threshold  True-pos-baseline  True-pos-call  False-pos  False-neg  Precision  Sensitivity  F-measure
----------------------------------------------------------------------------------------------------
   25.000            4022526        4022526          0      17539     1.0000       0.9957     0.9978
     None            4040028        4040028         37         37     1.0000       1.0000     1.0000

Again, thank you very much!

Sincerely,
Charles

Reply all
Reply to author
Forward
0 new messages