bufio.scanner: token too long kills elprep sfm on filter step

1 view
Skip to first unread message

A. Reese

unread,
Mar 22, 2024, 9:34:28 AM3/22/24
to elprep

Hello. I am unable to run elprep sfm, due to a bufio panic error during the elprep filter step. I did not have issues with either the vcf-to-elsites command, nor the fasta-to-elfasta commands. Please see below.

$ which go && go version
~/.conda/envs/WES_dev2/bin/go
go version go1.17.6 linux/amd64

$ which elprep && elprep
~/.conda/envs/WES_dev2/bin/elprep
elprep version 5.1.3 compiled with go1.17.6

  • The command I used:
    elprep sfm $sam_path $bam_path
    --output-type bam
    --replace-read-group "ID:group1 LB:1 PL:illumina PU:unit1 SM:sample"
    --mark-duplicates
    --mark-optical-duplicates $dupmetrics
    --remove-duplicates
    --sorting-order coordinate
    --bqsr $cal_tbl_path
    --known-sites $dbsnp_elsites
    --target-regions $bed_file
    --reference $elref_path
    --haplotypecaller $vcf_out
    --timed

  • Output from my terminal:
    elprep version 5.1.3 compiled with go1.17.6 - see http://github.com/exascience/elprep for more information.

2024/03/21 10:44:31 Created log file at /home/areese/logs/elprep/elprep-2024-03-21-10-44-31-633572959-EDT.log
2024/03/21 10:44:31 Command line: [elprep sfm /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/debug.reads_to_hg38.P14.sam /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.processed.bam --output-type bam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --mark-optical-duplicates /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/reads_to_hg38.P14.markedDuplicatesMetrics.txt --remove-duplicates --sorting-order coordinate --bqsr /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/bqsr_recalibration.tbl --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --haplotypecaller /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.vcf --timed]
2024/03/21 10:44:31 Executing command:
elprep sfm /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/debug.reads_to_hg38.P14.sam /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.processed.bam --output-type bam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --mark-optical-duplicates /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/reads_to_hg38.P14.markedDuplicatesMetrics.txt --optical-duplicates-pixel-distance 100 --remove-duplicates --bqsr /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/bqsr_recalibration.tbl --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --quantize-levels 0 --max-cycle 500 --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed --haplotypecaller /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.vcf --sorting-order coordinate --timed --intermediate-files-output-prefix debug.reads_to_hg38.P14 --intermediate-files-output-type sam
2024/03/21 10:44:31 Splitting...
2024/03/21 10:53:23 Filtering (phase 1)...
2024/03/21 10:53:24 exit status 2

2024/03/21 10:53:23 Created log file at /home/areese/logs/elprep/elprep-2024-03-21-10-53-23-776329743-EDT.log
2024/03/21 10:53:23 Command line: [elprep filter /home/shared/repos/RNA-seq_develop/elprep-splits-f58c8369-975d-439f-8426-5b564ed24d3b/splits/debug.reads_to_hg38.P14-unmapped.sam /home/shared/repos/RNA-seq_develop/elprep-splits-processed-f58c8369-975d-439f-8426-5b564ed24d3b/debug.reads_to_hg38.P14-unmapped.sam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --remove-duplicates --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --max-cycle 500 --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --sorting-order coordinate --timed --pg-cmd-line elprep sfm /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/debug.reads_to_hg38.P14.sam /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.processed.bam --output-type bam --replace-read-group ID:group1 LB:1 PL:illumina PU:unit1 SM:sample --mark-duplicates --mark-optical-duplicates /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/reads_to_hg38.P14.markedDuplicatesMetrics.txt --optical-duplicates-pixel-distance 100 --remove-duplicates --bqsr /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/bqsr_recalibration.tbl --reference /home/shared/projects/RNA-seq/references/UCSC/latest/hg38.P14.elfasta --quantize-levels 0 --max-cycle 500 --known-sites /home/shared/databases/dbsnp/Homo_sapiens_assembly38.dbsnp138.elsites --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed --haplotypecaller /home/shared/projects/WES/PilotStudy/ILN2_2024AE/HTB-2D-TR1_elprep/map_reads/HTB-2D-TR1_pilotWES_reads_to_hg38.P14.vcf --sorting-order coordinate --timed --intermediate-files-output-prefix debug.reads_to_hg38.P14 --intermediate-files-output-type sam --bqsr-tables-only /home/shared/repos/RNA-seq_develop/elprep-tabs-f58c8369-975d-439f-8426-5b564ed24d3b/debug.reads_to_hg38.P14-unmapped.sam.elrecal --mark-optical-duplicates-intermediate /home/shared/repos/RNA-seq_develop/elprep-metrics-f58c8369-975d-439f-8426-5b564ed24d3b/debug.reads_to_hg38.P14-unmapped.sam --optical-duplicates-pixel-distance 100 --target-regions /home/shared/projects/WES/hg38_Twist_ILMN_Exome_2.0_Plus_Panel_Combined_Mito.UCSC.bed]
2024/03/21 10:53:24 bufio.Scanner: token too long
panic: bufio.Scanner: token too long

goroutine 1 [running]:
log.Panic({0xc0001c5458, 0xc002331890, 0xc002333750})
/opt/conda/conda-bld/elprep_1651164620400/_build_env/go/src/log/log.go:354 +0x65
github.com/exascience/elprep/v5/bed.ParseBed({0x7ffd3bb9ba8c, 0xc00011e030})
/opt/conda/conda-bld/elprep_1651164620400/work/bed/bed-files.go:57 +0x56b
github.com/exascience/elprep/v5/cmd.Filter()
/opt/conda/conda-bld/elprep_1651164620400/work/cmd/filter.go:738 +0x353f
main.main()
/opt/conda/conda-bld/elprep_1651164620400/work/main.go:67 +0x245

Looks like it is happening during bed file parsing. Does that need to be reformatted into an elprep specific file type as well?
Thanks.

Reply all
Reply to author
Forward
0 new messages