How to run Structure faster in a linux server

975 views
Skip to first unread message

pinbo

unread,
Oct 27, 2011, 6:11:08 PM10/27/11
to structure-software
Hi All,

I am running Structure 2.1 command line in a linux server via SSH. I
found the speed was almost the same as running it in my laptop. So is
there a way to speed up the process: Is there a way for structure to
take advantage of multiple processors, possibly through
multithreading? Or is it possible to have multiple instances of
structure running on multiple processors?

Thanks.

Vikram Chhatre

unread,
Oct 27, 2011, 6:16:47 PM10/27/11
to structure...@googlegroups.com
Junli -

Did you mean version 2.3.3?

As far as I know, Structure supports parallelization on multithreaded
processors. Until earlier this year, Cornell University had a
publicly level computing cluster which did this. Unfortunately, that
cluster is not available for non-Cornell people anymore. In any case,
you could ask their admin about how to set up parallelization. Here
is the website: http://cbsuapps.tc.cornell.edu/structure.aspx

Vikram

> --
> You received this message because you are subscribed to the Google Groups "structure-software" group.
> To post to this group, send email to structure...@googlegroups.com.
> To unsubscribe from this group, send email to structure-softw...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/structure-software?hl=en.
>
>

pinbo

unread,
Oct 27, 2011, 7:02:45 PM10/27/11
to structure-software
HI Vikram,

When I ran Structure 2.3.3 on the server, I got an error message:

"./structure: /lib/libc.so.6: version `GLIBC_2.7' not found (required
by ./structure)".

The administration said they only installed glibc2.5, "To fix this you
will need to download and install a compatible version of glibc to
your home directory and then recompile structure to use that library
from your home directory. ". It is too complex for me, so I tried
Structure 2.1, and it worked well on this server. So I am now using
V2.1. If you had better idea to solve this problem, that would be a
great help.

Does Structure 2.1 support parallelization on multithreaded
processors? Anyway, I will ask the admin in Cornell.

Thanks a lot, Vikram.

Junli

Vikram Chhatre

unread,
Oct 27, 2011, 7:25:51 PM10/27/11
to structure...@googlegroups.com
Unfortunately, many sysadmins do not see the need to upgrade the
glibc. I would agree that updating glibc, then recompiling Structure
and satisfying all the dependencies may not be trivial. What flavor
and version of linux is this server running?

I will suggest looking at the change log between 2.1 and 2.3.3 to see
what functionality, if any, you will not have access to, with 2.1.

Vikram

pinbo

unread,
Oct 27, 2011, 8:58:16 PM10/27/11
to structure-software
Hi Vikram,

The server is "CentOS release 5.5 (Final)".
I checked the change log and it seems there are a lot of improvements
between 2.1 and 2.3. I hope I could still use 2.3, but I do not know
how to change the source and let it use the glibc I installed.

Thanks.

Junli

Vikram Chhatre

unread,
Oct 27, 2011, 9:40:12 PM10/27/11
to structure...@googlegroups.com
Junli -

It seems you're in a tight spot. I was just talking to some folks at
#centos and was told that centOS 5.5 is at least two major development
cycles behind and has serious security issues. Apparently, the most
current distribution in the 5.x range is 5.7. And then, there is
version 6.x and such.

Installing glibc in the home folder would expose the server to
exploits from outside the network. It is a little startling that your
sysadmin suggested installing glibc in home folder.

So long story short, unless that server is upgraded fully or the
sysadmin offers to at least upgrade the glibc systemwide, you may be
stuck with 2.1. You may want to explore alternate servers for your
analysis.

HTH
V

pinbo

unread,
Oct 27, 2011, 11:31:56 PM10/27/11
to structure-software
Hi Vikram,

Thanks for so many good suggestions. I think I will give up this
server. I almost finish running Structure in my laptop, but I am still
trying to figure out the server. That is so sad.

Junli

griffinia

unread,
Oct 29, 2011, 12:31:54 PM10/29/11
to structure-software
STRUCTURE does NOT support parallel processing. I asked the Cornell
folks
how they set it up, and they told me that 1) their cluster is actually
Windows not Linux and 2) they wrote a script that just splits up the
job and
parcels it out to different nodes in the cluster. Of course, as
Vikram
points put, it is a moot point unless you have a Cornell IP.

The Bioportal at the University of Oslo (http://www.bioportal.uio.no/)
does
however, allow guest accounts, and that is where I run most of my
large
batch STRUCTURE jobs now. A job cannot contain more than 100
replicated
runs, so if you are, let's say running k = 1 to 20, you submit 4 jobs
of k =
1-5, 6-10, etc. I run fairly high numbers of MCMC iterations (100K
burn-in,
1 million) at k ranges of 1 - 30 and have never had to wait more than
a few
days for my jobs to complete. You need to generate a mainparm file on
your
local machine for your data set to upload with your data file. It
doesn't
matter what k you specify in the mainparam file because you set that
in the
job form on the Bioportal.

In order to run your results files in Structure Harvester, you need to
append "_f" to the end and get rid of the extension. I use the free
program
Rename Master to do this in batch.

Alan

Junli Zhang

unread,
Oct 29, 2011, 12:57:07 PM10/29/11
to structure...@googlegroups.com
Hi Alan,

Very helpful information. Thanks a lot.
Now I just use "screen" to separate different "structure" process in the server, so I can run several structure the same time to shorten the time. I do not know whether there are better ideas. Also I found the structure process only use 24-26 M memory, but I do not know whether increasing its memory usage could increase the speed, and how to increase its memory size.

Junli

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To post to this group, send email to structure...@googlegroups.com.
To unsubscribe from this group, send email to structure-softw...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/structure-software?hl=en.




--
Junli

Vikram Chhatre

unread,
Oct 29, 2011, 1:07:05 PM10/29/11
to structure...@googlegroups.com
Alan -

Great info! Also, thank you for letting us know about the bioportal
at University of Oslo.

V

Dent

unread,
Oct 30, 2011, 9:46:28 AM10/30/11
to structure-software
Hi all,

I'm the author of Structure Harvester (http://taylor0.biology.ucla.edu/
structureHarvester/). Last week on the request of a user I made a stab
at handling the output files from a bioportal at U of Oslso. The
Harvester should now also take Structure files named as '*_f.N' where
the N is some integer. Hopefully this saves everyone a rename step. If
you find this feature doesn't work, please email me!

And regarding parallelizing on a linux machine: I think your best bet
is to break your job into as many pieces as you have processors and
then launch each piece in the background (or in GNU screen). Since
each K replicate is an independent run, you could break your jobs up
into as many as K * number of replicates, but running more simulations
simultaneously than you have processors will end up slowing your
server down much more than if you ration things out. E.g. let's say
I'm running K = 1..16 for 10 reps (160 runs total) and I have 16
processors to play with, I would create 16 parallel jobs with 10 runs
each, each job comprised of a *serial* run of Structure for a single
value of K on replicates 1..10.

Best,

d

Junli Zhang

unread,
Oct 30, 2011, 12:54:37 PM10/30/11
to structure...@googlegroups.com
Hi Dent,

Great improvement! Thanks.
I have one question about running many pieces of Structure simultaneously. Last time, I use GNU screen to run several pieces of Structure at the same time, each K with 3 reps. I let Structure save the output results into the same folder but with different file names. But I found the LnPD values of the 3 reps were the same for each K. So I wonder whether there is interaction among the different reps of the same K.

Thanks.

Junli

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To post to this group, send email to structure...@googlegroups.com.
To unsubscribe from this group, send email to structure-softw...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/structure-software?hl=en.




--
Junli

Dent

unread,
Oct 31, 2011, 11:45:43 AM10/31/11
to structure-software
Hi Junli,

That's odd, perhaps someone else can speak to that. I've certainly
seen independent runs come up with the same value of LnPD, usually for
K=1. In fact it happens with such regularity that there is code in the
Harvester to check for this condition; when all replicates from a K
have the same LnPD it makes the standard deviation for that K value
equal to 0 which prevents one from performing the Evanno test. I
usually recommend running more replicates, somewhere between 10 and 20
per K depending on how much extra time and compute resources you have
to spare.

But if all of your replicates for all of you K are the same then I
wonder if it's time to start tweaking your MCMC parameters. Anyone
else have any insight?

HTHs!

d

Adii_

unread,
Nov 2, 2011, 9:35:13 AM11/2/11
to structure-software
Hi Junli!
Give us the K's. Dent got the point with K=1 i low numeber od rep's.
We need more details to give some advices.

Have Fun, Adii_

Junli Zhang

unread,
Nov 2, 2011, 11:52:34 AM11/2/11
to structure...@googlegroups.com
Hi Adii_,

I tried K from 3 to 8, with Burnin =100000 and Numreps=100000, and 3 runs for each K. The results of the 3 runs were the same for each K. The structure version is 2.1, because the server does not support 2.3.3.

I also tried other Ks , with Burnin=100 and numreps=100, and 3 runs, just to quick test the results. It is still the same.
Thanks.

Junli

Junli Zhang

unread,
Nov 2, 2011, 12:31:37 PM11/2/11
to structure...@googlegroups.com
Hi All,

Attached are the shell script I used (ms3to5) and the Structure results. Just more details of the problem.
Thanks.

Junli
--
Junli
ms3to5.txt
resultsbak.zip

Vikram Chhatre

unread,
Nov 2, 2011, 1:11:05 PM11/2/11
to structure...@googlegroups.com
Junli -

This is unrelated to the discussion below, but I have a comment about
your shell script. You're losing the runtime screen output that
Structure produces which may be important for checking convergence of
parameters. That output can be captured as follows:

screen -S t31 -d -m ./structure -K 3 -o results2/gynohybrid_k3_run1
2>&1 | tee logs/gynohybrid_k3_run1.log

V

Junli Zhang

unread,
Nov 2, 2011, 1:29:21 PM11/2/11
to structure...@googlegroups.com
Hi Vikram,

I will open a new topic after trying the modified script. Thanks.

Junli
Reply all
Reply to author
Forward
0 new messages