Re: [dartR] gl.sfs

17 views
Skip to first unread message

Bernd.Gruber

unread,
May 14, 2026, 8:21:27 PM (6 days ago) May 14
to da...@googlegroups.com
Hi Mathew,

the function was not really well tested with higher ploidys than two (I have no higher ploidy data). In principle it should work (there is an edge case where it breaks,  if you have missing values in the corresponding folded and unfolded bins and it does not work in terms of naming bins at higher ploidy).

So can you do me a favour and try using:

gl.sfs(yourdata, folded=FALSE)

which should give you the non-folded version, that should not break (I am writing that without testing).

I am busy today but happy to have a look at the function the coming week.

Cheers, Bernd


From: 'Matthew Holen' via dartR <da...@googlegroups.com>
Sent: Friday, 15 May 2026 5:25 AM
To: dartR <da...@googlegroups.com>
Subject: [dartR] gl.sfs
 
Hello,

Id like to create a site frequency spectrum. I have three populations (which I believe are diploid, triploid, and tetraploid). When I use the gl.sfs() function it outputs a lot of numbers, but no plot. Any suggestions?

Thanks,
Matt

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dartr/97142ebd-a138-458e-875e-72916438fb1an%40googlegroups.com.

Jose Luis Mijangos

unread,
May 14, 2026, 8:33:15 PM (6 days ago) May 14
to dartR
Hi Matt,

The function only produces a plot when there is just one population in the dataset. That's the assumption of sfs. You can try splitting your populations as below.

t1 <- platypus.gl
t2 <- seppop(t1)
lapply(t2,gl.sfs)

Cheers,
Luis 

Matthew Holen

unread,
May 19, 2026, 10:18:48 AM (2 days ago) May 19
to dartR
Bernd and Luis,

Appreciate the reply, the generated sfs plots didn't show anything of interest, but I think I found a workaround for my data looking at the allele frequencies as histograms for the suspected ploidy pops. This is using the vcf file generated after realignment to a new genome to recover more SNPs than our normal SNP library. Don't think I can share the plot, but the histograms generally follow the distribution of ~0.5 for diploids, 0.33 / 0.66 for triploids, and 0.25 / 0.5 / 0.75 for tetraploids. Flow cytometry will have to be used for confirmation in my case. 

vcf_data <- read.vcfR("final_biallelic_snps.vcf")
ad <- extract.gt(vcf_data, element = 'AD')
colnames(ad)
matching_indices <- match(colnames(ad), metadata$id)
new_names <- metadata$genotype[matching_indices]
colnames(ad)[!is.na(new_names)] <- new_names[!is.na(new_names)]
#### Loop through the populations based on metadata column ####
pop_list <- unique(metadata$ploidy)
pop_list

pdf("Population_Allele_Balance.pdf", width = 8, height = 12)
par(mfrow = c(2, 1))

for (p in pop_list) {
 
  # Get all individual IDs belonging to this population
  pop_ids <- metadata$genotype[metadata$ploidy == p]
 
  # Concatenate all AD data for all individuals in this pop
  # This creates one massive pool of read counts for the population
  pop_ad <- as.vector(unlist(ad[, pop_ids]))
 
  # Calculate ratios for the entire population pool
  # change >= 30 for read depth which helps reduce noisy plots
  ratios_pop <- sapply(strsplit(pop_ad, ","), function(x) {
    counts <- as.numeric(x)
    total <- sum(counts)
    if(!is.na(total) && total >= 30) return(counts[2] / total) else return(NA) 
  })
 
  ratios_pop <- ratios_pop[!is.na(ratios_pop)]
 # drops homozygous reference or alternate reads, only keeping heterozygous
  ratios_het <- ratios_pop[ratios_pop > 0 & ratios_pop < 1]
 
  # Plot
  hist(ratios_het,
       breaks = 100,
       col = "darkorchid",
       main = paste("Population:", p, "| Sites:", length(ratios_het)),
       xlab = "Alternative Allele Ratio",
       xlim = c(0, 1))
 
  abline(v = c(0.25, 0.33, 0.5, 0.66, 0.75), col = "red", lty = 3)
}

dev.off()
Reply all
Reply to author
Forward
0 new messages