STRUCTURE: Raft of problems

544 views
Skip to first unread message

John Lindberg

unread,
Jul 7, 2022, 12:14:55 PM7/7/22
to structure-software
Hi all,

I am trying to run STRUCTURE (cmd-line version, on Ubuntu WSL) for a large dataset (822 individuals, 38987 loci), following conversion of the original vcf to structure files using PGDSpider (and subsequently ensuring that the structure file is in UNIX format, rather than Windows). The data have popdata (41 different populations). I do, however, get the following error messages, and I have to no avail try to find out why this problem keeps occuring: 

WARNING! Probable error in the input file.
Individual 822, locus 18260;  encountered the following data
"NA19239" when expecting an integer

readlociEOF

WARNING:  Unexpected end of input file.  The details of the
input file are set in mainparams.  I ran out of data while reading
the data for individual 822.

----------------------------------
There were errors in the input file (listed above). According to
"mainparams" the input file should contain one row of markernames with 38987 entries,
 822 rows with 77977 entries. .

There are 1645 rows of data in the input file, with an average of 38987.00
entries per line.  The following shows the number of entries in each
line of the input file:

# Entries:   Line numbers
     38985:   1
     38987:   2--1645
----------------------------------
The mainparams file looks like this:

#define NUMINDS 822
#define NUMLOCI 38987
#define LABEL 1
#define POPDATA 1
#define POPFLAG 0
#define LOCDATA 0
#define PHENOTYPE 0
#define MARKERNAMES 1
#define MAPDISTANCES 0
#define ONEROWPERIND 1
#define PHASEINFO 0
#define PHASED 0
#define RECESSIVEALLELES 0
#define EXTRACOLS 1
#define MISSING -9
#define PLOIDY 2
#define MAXPOPS 2
#define BURNIN 100
#define NUMREPS 100

#define NOADMIX 0
#define LINKAGE 0
#define USEPOPINFO 0

#define LOCPRIOR 0
#define INFERALPHA 1
#define ALPHA 1.0
#define POPALPHAS 0
#define UNIFPRIORALPHA 1
#define ALPHAMAX 10.0
#define ALPHAPROPSD 0.025

#define FREQSCORR 1
#define ONEFST 0
#define FPRIORMEAN 0.01
#define FPRIORSD 0.05

#define INFERLAMBDA 0
#define LAMBDA 1.0
#define COMPUTEPROB 1
#define PFROMPOPFLAGONLY 0
#define ANCESTDIST 0
#define STARTATPOPINFO 0
#define METROFREQ 10

#define UPDATEFREQ 1 

Any ideas on why these errors keep cropping up? And why is STRUCTURE saying that the input file should contain 822 rows with 77977 entries, when there are only 38987 loci?

Thanks in advance!

John 

Vikram Chhatre

unread,
Jul 7, 2022, 12:18:07 PM7/7/22
to structure-software
This is a commonly encountered problem caused by incorrectly formatted line breaks. It usually happens when files are transferred from a Windows to a Unix machine, but can occur on Unix itself also.  Here is a simple solution to try:

vim input.str

:set ff?  (answer should be Unix)

:set ff=unix

:wq (save and close)

vim input.str

:%s/^M/^M/g (in vim, ^M command requires the following keystrokes: Ctrl+V, ENTER

:wq

Try running structure again.

--
You received this message because you are subscribed to the Google Groups "structure-software" group.
To unsubscribe from this group and stop receiving emails from it, send an email to structure-softw...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/structure-software/08a5ae4d-0c73-4f52-bd6e-5ac5a2b668ecn%40googlegroups.com.

Josh Banta

unread,
Jul 7, 2022, 12:39:26 PM7/7/22
to structure...@googlegroups.com
Dear John,

You have reached out to the right folks.

I have tutorials for (a) converting to the STRUCTURE format and (b) installing and running STRUCTURE under Ubuntu Linux. I highly recommend using my tutorial and its R script for the file conversion rather than PGDSpider, which never seems to work properly.

Here are the links:

Incidentally, you could also consider using FASTStructure, but I wouldn't recommend it. It gives results that are very weird. But if you want to try it, here's how to install it and use it:

Best wishes,
Josh Banta

Vikram Chhatre

unread,
Jul 7, 2022, 12:44:37 PM7/7/22
to structure-software
Josh -

Thanks for posting links to your tutorials which should be helpful.

Two quick notes:

1. PGDSpider does in fact work. I have used it countless times to convert VCF data to various formats including structure and while it can be a bit finicky, I have never had a longstanding problem that would render it unusable.

2. Suggesting that FastStructure gives weird results is not very helpful. Perhaps, you can point out specific problems with the method.

V

Josh Banta

unread,
Jul 7, 2022, 1:01:53 PM7/7/22
to structure...@googlegroups.com
Dear Vikram,

First of all, I offer you and others my apology for being too vague.

I should clarify that I am only referring to my personal experiences. So while PGDSpider is no doubt very useful for many people, If someone is able to successfully convert their files, I have every reason to expect the converted data is valid. But I have had problems when trying to convert to the STRUCTURE and Arlequin formats, where the converted files simply wouldn't work when inserted into the relevant programs.

Regarding FastStructure: I believe it gave me results suggesting that my populations were homogenous, whereas the regular STRUCTURE analysis gave me results that accorded with the biology of the species (showing admixture where I expected it to be, showing different inferred ancestral groups, etc). I believe the example file I use in the FastStructure tutorial is the one that gave me the homogenous results. As I recall, I used the same file for the STRUCTURE tutorial and it showed much more genetic structure that made more biological sense.

The following posts resonated with my experiences with FastStructure:

Best wishes,
Josh


Reply all
Reply to author
Forward
0 new messages