repeat sample names

48 views
Skip to first unread message

Anthony D. Long

unread,
Aug 28, 2019, 1:14:07 PM8/28/19
to STITCH imputation
I am getting an error message saying that the sample names are not unique.  But the names in my input file are unique.  So I do not understand why I am getting this error.

./STITCH/STITCH.R --chr="ChrX" --bamlist="newer.bamlist.txt" --posfile="/share/adl/tdlong/mouse_GWAS/data/vcf/P.leu_X_uniq.pos" --outputdir="STI8/ChrX" --K=8 --nGen=60 --nCores=16 --output_haplotype_dosages=TRUE

[2019-08-28 10:06:47] Running STITCH(chr = ChrX, nGen = 60, posfile = /share/adl/tdlong/mouse_GWAS/data/vcf/P.leu_X_uniq.pos, K = 8, S = 1, outputdir = STI8/ChrX, nStarts = , tempdir = NA, bamlist = newer.bamlist.txt, cramlist = , sampleNames_file = , reference = , genfile = , method = diploid, output_format = bgvcf, B_bit_prob = 16, outputInputInVCFFormat = FALSE, downsampleToCov = 50, downsampleFraction = 1, readAware = TRUE, chrStart = NA, chrEnd = NA, regionStart = NA, regionEnd = NA, buffer = NA, maxDifferenceBetweenReads = 1000, maxEmissionMatrixDifference = 1e+10, alphaMatThreshold = 1e-04, emissionThreshold = 1e-04, iSizeUpperLimit = 600, bqFilter = 17, niterations = 40, shuffleHaplotypeIterations = c(4, 8, 12, 16), splitReadIterations = 25, nCores = 16, expRate = 0.5, maxRate = 100, minRate = 0.1, Jmax = 1000, regenerateInput = TRUE, originalRegionName = NA, keepInterimFiles = FALSE, keepTempDir = FALSE, outputHaplotypeProbabilities = FALSE, switchModelIteration = NA, generateInputOnly = FALSE, restartIterations = NA, refillIterations = c(6, 10, 14, 18), downsampleSamples = 1, downsampleSamplesKeepList = NA, subsetSNPsfile = NA, useSoftClippedBases = FALSE, outputBlockSize = 1000, outputSNPBlockSize = 10000, inputBundleBlockSize = NA, genetic_map_file = , reference_haplotype_file = , reference_legend_file = , reference_sample_file = , reference_populations = NA, reference_phred = 20, reference_iterations = 40, reference_shuffleHaplotypeIterations = c(4, 8, 12, 16), output_filename = NULL, initial_min_hapProb = 0.2, initial_max_hapProb = 0.8, regenerateInputWithDefaultValues = FALSE, plotHapSumDuringIterations = FALSE, plot_shuffle_haplotype_attempts = FALSE, plotAfterImputation = TRUE, save_sampleReadsInfo = FALSE, gridWindowSize = NA, shuffle_bin_nSNPs = NULL, shuffle_bin_radius = 5000, keepSampleReadsInRAM = FALSE, useTempdirWhileWriting = FALSE, output_haplotype_dosages = TRUE)
[2019-08-28 10:06:47] Program start
[2019-08-28 10:06:47] Get and validate pos and gen
[2019-08-28 10:06:48] Done get and validate pos and gen
[2019-08-28 10:06:48] Get BAM sample names
[2019-08-28 10:06:48] Done getting BAM sample names
Error in get_sample_names(bamlist = bamlist, cramlist = cramlist, nCores = nCores,  :
  There are repeat sample names
Calls: STITCH -> get_sample_names
In addition: Warning message:
In mclapply(files, mc.cores = nCores, get_sample_name_from_bam_file_using_SeqLib) :
  all scheduled cores encountered errors in user code
Execution halted

cat newer.bamlist.txt
bam/merge/11001.rmdup.bam
bam/merge/11002.rmdup.bam
bam/merge/11003.rmdup.bam
bam/merge/18923.rmdup.bam
bam/merge/18929.rmdup.bam
bam/merge/18950.rmdup.bam
bam/merge/18953.rmdup.bam
bam/merge/18957.rmdup.bam
bam/merge/19037.rmdup.bam
bam/merge/19046.rmdup.bam
...
bam/merge/reference.RG.q30.bam.rmdup.bam
bam/merge/s37.rmdup.bam
bam/merge/s38.rmdup.bam
bam/merge/s39.rmdup.bam
...

cat newer.bamlist.txt | sort | uniq -c
all names are uniq

Anthony D. Long

unread,
Aug 28, 2019, 1:39:25 PM8/28/19
to STITCH imputation
I seem to have figured it out myself.

I either have to give a list of sample names using something like:

--sampleNames_file="newer.samplenames.txt"

Or add the sample names into the bam header, I think the error was being thrown because I did not include the sample names in the bam file header (just the name of the bam file itself).

Tony


Robbie Davies

unread,
Aug 29, 2019, 7:36:36 AM8/29/19
to Anthony D. Long, STITCH imputation
OK, great! Sorry for the confusion. Yes, that sounds likely, STITCH default assumes the SM tag is set in the header, and is unique for all samples. Either approach should fix this situation. 

If you have any more trouble let me know,
Best,
Robbie
 

--
You received this message because you are subscribed to the Google Groups "STITCH imputation" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stitch-imputat...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stitch-imputation/1d96579b-756d-4184-83c3-d7bf9f6a71d8%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages