If I use a sampleNames_file do the BAM files need to specify SM= in the header ?

55 views
Skip to first unread message

Boo Guy

unread,
Mar 2, 2021, 10:24:06 AM3/2/21
to STITCH imputation
HI 

If I use  a sampleNames_file do the BAM files need  to specify  SM= in the header ?
Currently my indexed .bam files have not SM= specified.

Currently I am getting the following error and cannot figure it out.

Program start [2021-03-01 22:24:29] Get and validate pos and gen Error in names(x) <- value : 'names' attribute [4] must be the same length as the vector [1] 

any ideas?

Thanks 

Guy


I am uses the latest version of STITCH, the test datasets works for me.

head of bamlist 
/home/reeves/oldbamtest/bams/GuysOldBam/b106_10.bam
/home/reeves/oldbamtest/bams/GuysOldBam/b106_19.bam
/home/reeves/oldbamtest/bams/GuysOldBam/b106_30.bam

head of sampleNames_file
b106_10
b106_19
b106_30


[2021-03-01 22:24:29] Running STITCH(chr = chr2L, nGen = 11, posfile = posfileREF.txt, K = 4, S = 1, outputdir = /home/reeves/oldbamtest/, nStarts = , tempdir = /tmp/Rtmpflbp7l, bamlist = listOfFiles_GuysOldBam.txt, cramlist = , sampleNames_file = names_GuysOldBam.txt, reference = , genfile = , method = diploid, output_format = bgvcf, B_bit_prob = 16, outputInputInVCFFormat = TRUE, downsampleToCov = 50, downsampleFraction = 1, readAware = TRUE, chrStart = NA, chrEnd = NA, regionStart = NA, regionEnd = NA, buffer = NA, maxDifferenceBetweenReads = 1000, maxEmissionMatrixDifference = 10000000000, alphaMatThreshold = 0.0001, emissionThreshold = 0.0001, iSizeUpperLimit = 600, bqFilter = 17, niterations = 40, shuffleHaplotypeIterations = c(4, 8, 12, 16), splitReadIterations = 25, nCores = 12, expRate = 0.5, maxRate = 100, minRate = 0.1, Jmax = 1000, regenerateInput = TRUE, originalRegionName = NA, keepInterimFiles = FALSE, keepTempDir = FALSE, outputHaplotypeProbabilities = FALSE, switchModelIteration = NA, generateInputOnly = FALSE, restartIterations = NA, refillIterations = c(6, 10, 14, 18), downsampleSamples = 1, downsampleSamplesKeepList = NA, subsetSNPsfile = NA, useSoftClippedBases = FALSE, outputBlockSize = 1000, outputSNPBlockSize = 10000, inputBundleBlockSize = NA, genetic_map_file = , reference_haplotype_file = , reference_legend_file = , reference_sample_file = , reference_populations = NA, reference_phred = 20, reference_iterations = 40, reference_shuffleHaplotypeIterations = c(4, 8, 12, 16), output_filename = NULL, initial_min_hapProb = 0.2, initial_max_hapProb = 0.8, regenerateInputWithDefaultValues = FALSE, plotHapSumDuringIterations = FALSE, plot_shuffle_haplotype_attempts = FALSE, plotAfterImputation = TRUE, save_sampleReadsInfo = FALSE, gridWindowSize = NA, shuffle_bin_nSNPs = NULL, shuffle_bin_radius = 5000, keepSampleReadsInRAM = FALSE, useTempdirWhileWriting = FALSE, output_haplotype_dosages = FALSE, use_bx_tag = TRUE, bxTagUpperLimit = 50000) [2021-03-01 22:24:29] Program start [2021-03-01 22:24:29] Get and validate pos and gen Error in names(x) <- value : 'names' attribute [4] must be the same length as the vector [1]


Robbie Davies

unread,
Mar 3, 2021, 9:39:56 AM3/3/21
to Boo Guy, STITCH imputation
Hi Guy,

You should be able to use sampleNames_file to avoid specifying SM= in the header. I just tried and it's working for me.

The error message suggests the posfile is malformed? Could you check that it's tab separated and otherwise matches the format of the test data?

Best,
Robbie

--
You received this message because you are subscribed to the Google Groups "STITCH imputation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stitch-imputat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stitch-imputation/993ef428-6833-48fc-9049-4932a54f29edn%40googlegroups.com.

Boo Guy

unread,
Mar 3, 2021, 10:06:27 AM3/3/21
to STITCH imputation
Dear Robbie 

It was not tabbed as you suspected.  Once I also removed the all the indels so only SNPs were present it started to run.


Thanks 

Guy

Reply all
Reply to author
Forward
0 new messages