How to determine the number of loci?

236 views
Skip to first unread message

Erkaya Mustafa

unread,
Jun 14, 2023, 4:18:03 PM6/14/23
to structure-software
Hi y'all and Dr. Banta!

I followed Josh Banta's Structure tutorial on youtube and I got my results! Thank you so much Dr. Banta!

But I still have a question. For my study, I have 1112bp long sequences and I'd like to know the number of loci in my sequences. When I converted my files from fasta to structure files by following Dr. Banta's tutorial, it counts each nucleotide as one locus.

This doesn't feel right. Can some explain me how can I determine the number of loci in my sequences?

Mustafa Erkaya

Josh Banta

unread,
Jun 14, 2023, 4:40:31 PM6/14/23
to structure...@googlegroups.com
Dear Eryka,

Each base pair site is a locus. But I think what you would like to know is how many polymorphic (variable) loci that you have. I believe the STRUCTURE output contains this information. If not, you could import your data into R, and get the answer there. Something like this:

##not tested, may have bugs

##set working directory!

install.packages("ape")
library(ape)

#have your data in the working directory in FASTA format, and replace "your_filename.fas" below with the name of your file

x <- read.dna("your_filename.fas", as.character = T)

d3 <- apply(x,2,function(d{

d1 <- d[which(!is.na(d))]
d2 <- table(d1) 
length(summary(d2)) > 1

})

#this is the answer:
length(which(d3 == TRUE))


--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/structure-software/ce50caab-2897-4ca7-977e-b585e3cd8914n%40googlegroups.com.

Erkaya Mustafa

unread,
Jun 14, 2023, 7:12:10 PM6/14/23
to structure-software
Hi Dr. Banta. 

Yes, indeed that's what I'm trying to do!

Unfortunately, the outcome of this code is exactly the same as what I already had. Here's the code I ran.

##not tested, may have bugs

##set working directory!

install.packages("ape")
library(ape)

#have your data in the working directory in FASTA format, and replace "your_filename.fas" below with the name of your file

x <- read.dna("PKDREJ_Conc_B.septuiclavis_Except_179116.fasta", as.character = T, format = "fasta")

d3 <- apply(x, 2, function(d) {
  d1 <- d[which(!is.na(d))]
  d2 <- table(d1)
  length(summary(d2)) > 1
})

length(which(d3 == TRUE))


And here's the result I got:

[1] 1112

So, nothing is different. And I think Structure shows me the same results as well. Just 1112 loci.

Josh Banta

unread,
Jun 14, 2023, 8:50:15 PM6/14/23
to structure...@googlegroups.com
Dear Erkaya,

My code had a bug in it! Please try again:

##not tested, may have bugs

##set working directory!

install.packages("ape")
library(ape)

#have your data in the working directory in FASTA format, and replace "your_filename.fas" below with the name of your file

x <- read.dna("PKDREJ_Conc_B.septuiclavis_Except_179116.fasta", as.character = T, format = "fasta")

d3 <- apply(x, 2, function(d) { 
  d1 <- d[which(!is.na(d))]
  d2 <- table(d1) 
  length(d2) > 1
})

length(which(d3 == TRUE))



Erkaya Mustafa

unread,
Jun 15, 2023, 7:06:12 AM6/15/23
to structure-software
Amazing! Now it shows I have 98 loci!

Can you explain to me how does this work? How does it detect the number of polymorphic (variable) loci ?

Thank you so much!

ME

banta....@gmail.com

unread,
Nov 16, 2025, 4:26:19 PMNov 16
to structure-software
Better late than never. 

The code I gave you works as follows:

1) Reads a FASTA file into R as aligned DNA sequences.

2) Treats each column of the alignment as a single locus / nucleotide site.

3) For each column, checks whether more than one unique nucleotide is present (excluding missing values such as NA).

4) Counts how many columns are polymorphic, meaning they contain variation across sequences.

Best,
Josh

Reply all
Reply to author
Forward
0 new messages