Problem with gl.run.structure using large data

257 views
Skip to first unread message

Comlan Arnaud Gouda

unread,
Jun 7, 2023, 3:49:39 PM6/7/23
to dartR

Hi all,

 

I am coming to a problem with running STRUCTURE analysis using gl.run.structure command with DArT package.

I firstly convert my data (9K samples with 27K SNPs) in vcf format to genlight object

Then used the script below to run the STRUCTURE analysis in window system (32 RAM memory)

 

library(dartR)

 

gl11b <- readRDS("mygenlightdata.RDS")

 

sr <-  gl.run.structure(

  gl11b,

  k.range = 2:10,

  num.k.rep = 10,

  burnin = 10000,

  numreps = 100000,

  noadmix = FALSE,

  exec = "D:/Essaie/structure.exe",

  plot.out = TRUE,

  save2tmp = TRUE)

 

Unfortunately, I am getting the following output:

 

Reading file "gtypes.created.on.2023-06-07.12.14.28.structureRun/gtypes.created.on.2023-06-07.12.14.28.structureRun.k1.r1_mainparams".

datafile is

gtypes.created.on.2023-06-07.12.14.28.structureRun/gtypes.created.on.2023-06-07.12.14.28.structureRun.k1.r1_data

Reading file "gtypes.created.on.2023-06-07.12.14.28.structureRun/gtypes.created.on.2023-06-07.12.14.28.structureRun.k1.r1_extraparams".

Note: RANDOMIZE is set to 1. The random number generator will be initialized using the system clock, ignoring any specified value of SEED.

Error in assigning memory (not enough space?)

 

Exiting the program due to error(s) listed above.

 

Error in FUN(X[[i]], ...) :

  Error running STRUCTURE. Error code 1 returned.

 

Pls Any ideas as to what I can fix would be really appreciated!!

 

Thanks,

Arnaud 

 

Bernd.Gruber

unread,
Jun 7, 2023, 10:09:29 PM6/7/23
to da...@googlegroups.com

Hi Arnaud,

 

Based on your folder name I assume you are running this on Windows. There is a memory limitation on R on Windows (less so on Linux/Macos), hence I assume you need to use a different OS, or you may try to increase memory using R mechanism (memory.limit()) , but this is no longer supported after R 4.2 if I am not mistaken.

 

 

So most likely you need to shift to a Linux based R.

 

Cheers, Bernd

 

 

 

==============================================================================

Dr Bernd Gruber                                              )/_         

                                                         _.--..---"-,--c_    

Professor Ecological Modelling                      \|..'           ._O__)_     

Tel: (02) 6206 3804                         ,=.    _.+   _ \..--( /          

Fax: (02) 6201 2328                           \\.-''_.-' \ (     \_          

Institute for Applied Ecology                  `'''       `\__   /\          

Faculty of Science and Technology                          ')                

University of Canberra   ACT 2601 AUSTRALIA

Email: bernd....@canberra.edu.au

WWW: bernd-gruber

 

Australian Government Higher Education Provider Number CRICOS #00212K 

NOTICE & DISCLAIMER: This email and any files transmitted with it may contain
confidential or copyright material and are for the attention of the addressee
only. If you have received this email in error please notify us by email
reply and delete it from your system. The University of Canberra accepts
no liability for any damage caused by any virus transmitted by this email.

==============================================================================

--
You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/d0e533c8-2006-41a2-a5c3-54e423643384n%40googlegroups.com.

Jose Luis Mijangos

unread,
Jun 7, 2023, 10:26:34 PM6/7/23
to dartR
Hi Arnaud,

Running Structure in such a large dataset and with those settings requires a lot of computing power, the analysis of your dataset would take a very long time probably months, on a personal computer.

Some options are:
- Run Structure on a high-performance computing cluster, see this paper: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1593-0
- Give a try to the program fastStructure (https://rajanil.github.io/fastStructure/) but it is a bit of trouble to get it to run. You could use the function gl2faststructure to convert a genlight object into faststructure format. Results could be processed using Clumpak (http://clumpak.tau.ac.il/). 
- You could filter very stringently your dataset and subsample loci (using function gl.subsample.loci and choosing the most informative loci i.e. method = "pic"). 1,000 loci would do depending on the level of genetic structure of your dataset, you could have a look at the below articles to give you an idea about how many loci to use. 

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3797491/ 
https://onlinelibrary.wiley.com/doi/full/10.1111/1755-0998.12650

Cheers,
Luis 

Bernd.Gruber

unread,
Jun 7, 2023, 10:31:52 PM6/7/23
to da...@googlegroups.com

Hi Jose,

 

 

I would try to run a subset of your data set first (e.g. only 100 loci to see if it works).

 

But Luis is right most likely you would need to run it on a cluster or fun faststructure with is faster (but different though the name indicates it is the same).

 

Cheers, Bernd

 

 

 

==============================================================================

Dr Bernd Gruber                                              )/_         

                                                         _.--..---"-,--c_    

Professor Ecological Modelling                      \|..'           ._O__)_     

Tel: (02) 6206 3804                         ,=.    _.+   _ \..--( /          

Fax: (02) 6201 2328                           \\.-''_.-' \ (     \_          

Institute for Applied Ecology                  `'''       `\__   /\          

Faculty of Science and Technology                          ')                

University of Canberra   ACT 2601 AUSTRALIA

Email: bernd....@canberra.edu.au

WWW: bernd-gruber

 

Australian Government Higher Education Provider Number CRICOS #00212K 

NOTICE & DISCLAIMER: This email and any files transmitted with it may contain
confidential or copyright material and are for the attention of the addressee
only. If you have received this email in error please notify us by email
reply and delete it from your system. The University of Canberra accepts
no liability for any damage caused by any virus transmitted by this email.

==============================================================================

 

From: da...@googlegroups.com <da...@googlegroups.com> On Behalf Of Jose Luis Mijangos


Sent: Thursday, 8 June 2023 12:27
To: dartR <da...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "dartR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.

GOUDA

unread,
Jun 8, 2023, 8:40:42 AM6/8/23
to da...@googlegroups.com

Hello Luis and Bernd,

Thank you for all the suggestions you have given me to solve the problem,

I will try to use the FastStructure program.

Cheers,,

Arnaud

--
You received this message because you are subscribed to a topic in the Google Groups "dartR" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/dartr/-lbeVqV-ofc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to dartr+un...@googlegroups.com.
--
Cdlt, Arnaud.

Hewan Demissie

unread,
Jul 4, 2023, 5:01:31 PM7/4/23
to dartR
Hello Luis, 
I read the comments and suggestions from this site. 
Now when I am running with my sample, It gave me this response "Error in FUN(X[[i]], ...) : You do not have STRUCTURE installed"

Please help. 
this is my command 
out<- gl.run.structure(
  gl,
  k.range = 1:5,
  num.k.rep = 10,
  burnin = 30, # 30,000
  numreps = 100, # 1,000,000
  exec = "D:/research/Avocado/GBS-Avocado/STRUCTURE",
  noadmix=FALSE)

out_evanno4 <- gl.evanno(out)
qmat4 <- gl.plot.structure(out, K=2, colors_clusters = c("#1F78B4", "#33A02C"))

Bernd.Gruber

unread,
Jul 4, 2023, 7:19:14 PM7/4/23
to da...@googlegroups.com

Hi,

 

Assuming you are running structure under windows you need to provide the full path to the exe file e.g.

 

 

D:/research/Avocado/GBS-Avocado/STRUCTURE/structure.exe

 

 

If under macos or linux you need to do

 

./research/Avocado/GBS-Avocado/STRUCTURE/structure

 

 

Then it should run,

 

Cheers, Bernd

 

 

==============================================================================

Dr Bernd Gruber                                              )/_         

                                                         _.--..---"-,--c_    

Professor Ecological Modelling                      \|..'           ._O__)_     

Tel: (02) 6206 3804                         ,=.    _.+   _ \..--( /          

Fax: (02) 6201 2328                           \\.-''_.-' \ (     \_          

Institute for Applied Ecology                  `'''       `\__   /\          

Faculty of Science and Technology                          ')                

University of Canberra   ACT 2601 AUSTRALIA

Email: bernd....@canberra.edu.au

WWW: bernd-gruber

 

Australian Government Higher Education Provider Number CRICOS #00212K 

NOTICE & DISCLAIMER: This email and any files transmitted with it may contain
confidential or copyright material and are for the attention of the addressee
only. If you have received this email in error please notify us by email
reply and delete it from your system. The University of Canberra accepts
no liability for any damage caused by any virus transmitted by this email.

==============================================================================

 

--

You received this message because you are subscribed to the Google Groups "dartR" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dartr+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dartr/15d7c1be-1664-423c-9201-9dee3dae2210n%40googlegroups.com.

Hewan Demissie

unread,
Jul 7, 2023, 8:47:11 AM7/7/23
to dartR
Thank you, Bernd, 
What is this exe file? do I need to save the gl in the structure software? 
Please I read but did not understand about the exe file.
Please help here 

Jose Luis Mijangos

unread,
Jul 24, 2023, 1:40:49 AM7/24/23
to dartR
Hi,

First, you should read the documentation of the function, which you can open by running the below command in the R console:

> ?gl.run.structure

In the documentation, you will find, among other things, instructions about how to run the function (examples) and a link to download the "exe" file or "executable" file or "binary" file for your system. These types of files are already "compiled" and do not need to be installed into your machine to run them.   

The easier way to do this is to move the Structure binary file to your working directory, which you can get by running the below command in the R console:

> getwd()

Then in the parameter function "exec" you should use:

> library(dartR)  
> # filtering loci with all missing data
> t1 <- gl.filter.allna(platypus.gl)
> # running Structure in MACOS
> res <- gl.run.structure(t1, exec = "./structure")
> # running Structure in Windows
> res <- gl.run.structure(t1, exec = "/structure.exe")

Note that the default values of the function are not to be used for a final run, read more in: 

Cheers,
Luis 
Reply all
Reply to author
Forward
0 new messages