populations failure: matches.tsv.gz

1,409 views
Skip to first unread message

odrade...@gmail.com

unread,
Jul 25, 2014, 5:16:02 AM7/25/14
to stacks...@googlegroups.com
Dear all,

I've been running stacks on single end RADseq datas. Everything was fine until populations (I think we have version 18).
It seems unable to find the "matches.tsv" files because it's asking for "matches.tsv.gz" (see e.pop.txt attached)
I tried changing the name of the file by adding .gz to it. However it did not change anything.
As anyone else encountered this problem ? Or have a solution to solve it ?

Thanks for your help
Odrade

e.pop.txt

Julian Catchen

unread,
Jul 29, 2014, 1:37:32 AM7/29/14
to stacks...@googlegroups.com, odrade...@gmail.com
Hi Odrade,

If the *.matches.tsv files are not present, then the pipeline failed at
an earlier stage. These files are produced by sstacks and require that a
catalog was successfully built. What is the output from the earlier
parts of your Stacks run. If you used denovo_map.pl, what does the
denovo_map.log file say?

Best,

julian

odrade...@gmail.com

unread,
Jul 29, 2014, 8:35:16 AM7/29/14
to stacks...@googlegroups.com, odrade...@gmail.com, jcat...@uoregon.edu
Hi Julian,
Thanks for the answer.
To be more precise with my problem:
    1. I used fastqgz files to run process_radtags and the output was in fastq.
    2. Then ustacks correctly generated individual files (*.snp.tsv; *.tags.tsv and *.alleles.tsv) and cstacks generated the catalog files (batch_1)
    3. I also suppose that sstacks worked as *.matches.tsv files were generated. However, populations.sge ask for *.matches.tsv.gz that were not generated.
I'm running stacks on a cluster, could the problem comes from that?
Best
Odrade

Celine Reisser

unread,
Jul 29, 2014, 10:43:27 AM7/29/14
to stacks...@googlegroups.com
Hi Odrade, Hi Julian,

Julian wasn't there some parameter changes and new pipeline modifications dealing with compressed input/output in the 2 or 3 latest versions of Stacks?
Maybe it comes indeed from the cluster if for example it doesn't have the correct zlib version/path or similar so the files can't be generated properly or can't be read?
One would have to contact the cluster maintenance and ask for the proper package to figure that out.
It's just a thought... anyone with such an issue on a cluster before?


C.

Julian Catchen

unread,
Jul 29, 2014, 2:34:44 PM7/29/14
to stacks...@googlegroups.com, odrade...@gmail.com
Hi Odrade,

Celine's answer may be correct. Are the rest of your Stacks files
gzipped (it sounds like they are not from your message)?

Stacks expects all the internal files to be gzipped or none of them,
perhaps you are running an older version of populations against data
generated by newer versions of the pipeline programs?

Also, what is the contents of your population map file?

I just tested the populations program with gzipped and noncompressed
files and it works from here.

Best,

julian
odrade...@gmail.com wrote:
> Hi Julian,
> Thanks for the answer.
> To be more precise with my problem:
> 1. I used fastqgz files to run process_radtags and the output was in fastq.
> 2. Then ustacks correctly generated individual files (*.snp.tsv;
> *.tags.tsv and *.alleles.tsv) and cstacks generated the catalog files
> (batch_1)
> 3. I also suppose that sstacks worked as *.matches.tsv files were
> generated. However, populations.sge ask for *.matches.tsv.gz that were
> not generated.
> I'm running stacks on a cluster, could the problem comes from that?
> Best
> Odrade
>
> Le mardi 29 juillet 2014 07:37:32 UTC+2, Julian Catchen a écrit :
>
> Hi Odrade,
>
> If the *.matches.tsv files are not present, then the pipeline failed at
> an earlier stage. These files are produced by sstacks and require
> that a
> catalog was successfully built. What is the output from the earlier
> parts of your Stacks run. If you used denovo_map.pl
> <http://denovo_map.pl>, what does the
> denovo_map.log file say?
>
> Best,
>
> julian
>

Sarah Kingston

unread,
Dec 11, 2014, 2:31:05 PM12/11/14
to stacks...@googlegroups.com, odrade...@gmail.com, jcat...@uoregon.edu
I have actually run into the same issue at the original post - I am running populations on a series of older files I have converted with convert_stacks.pl.  At no point in the pipeline were they compressed, started from .fq initially.

populations keeps looking for *matches.tsv.gz files and my files are *matches.tsv, not sure how to reconcile the problem.

Thanks!

Sarah K

Carlos Munoz Ramirez

unread,
Dec 20, 2014, 2:17:34 PM12/20/14
to stacks...@googlegroups.com
Hi All,

I have the same problem. I transferred all the output files from the lab computer, to my laptop in which I installed stacks only recently. The population program don't find the *match.tsv files because it is looking for *match.tsv.gz files.

Best,

Carlos.

Julian Catchen

unread,
Dec 21, 2014, 5:17:16 PM12/21/14
to stacks...@googlegroups.com, carm...@umich.edu, kingsto...@gmail.com
Hi Carlos and Sarah,

That error message is a little misleading. The populations program will first try to find the uncompressed files, then it will search for compressed versions. If it finds neither it will print an error, which just happens to include the ".gz" suffix on the filenames because it tried to find compressed versions last.

So, the program is unable to locate your files. Can you specify the command line you executed and where your data are located?

Best,

julian

Sarah Kingston

unread,
Dec 30, 2014, 10:59:35 AM12/30/14
to Julian Catchen, stacks...@googlegroups.com, carm...@umich.edu
Thanks - the (uncompressed) files are in a directory 'stacksC' just below the directory in which I am running the command:

populations -b 2 -P stacksC -M popmap_Lobfinal.txt -s -m 2 -t 1 -a 0.01 -f p_value --fstats --vcf --genepop --plink

Sarah Kingston

unread,
Dec 30, 2014, 11:24:35 AM12/30/14
to Julian Catchen, stacks...@googlegroups.com, carm...@umich.edu
Sorry for the flurry of emails - I actually reorganized my working files and now the v 1.22 finds the files in the populations command but I run into another segmentation fault, core dump error

populations -b 1 -P mariposa/converted -M popmap_Lobfinal.txt -s -m 2 -t 1 -a 0.01 -f p_value --fstats --vcf --genepop --plink

Fst kernel smoothing: off

Bootstrap resampling: off

Percent samples limit per population: 0

Locus Population limit: 1

Minimum stack depth: 1

Log liklihood filtering: off; threshold: 0

Minor allele frequency cutoff: 0.01

Applying Fst correction: P-value correction.

Parsing population map.

Found 143 input file(s).

  5 populations found [. . .]


Populating observed haplotypes for 112 samples, 262297 loci.

Segmentation fault (core dumped)


Cheers,
Sarah K

Julian Catchen

unread,
Dec 31, 2014, 12:03:02 PM12/31/14
to stacks...@googlegroups.com, kingsto...@gmail.com
Hi Sarah,

I'm not sure exactly what is going on here, but I would try a couple of things to further troubleshoot:

1) you may be running out of memory as it is trying to process 262,000 loci. I would specify the -p and/or the -r parameters to limit the number of loci it has to look at. You probably want to bring the number down to less than 50000 loci unless you have a lot of memory.

2) Try running the program with the minimal option set, then add the options you are specifying one at a time (-m, -a, --fstats, --vcf, --genepop, --plink). This could help me identify where the problem may be.

Best,

julian

Julian Catchen

unread,
Feb 7, 2015, 9:14:21 PM2/7/15
to stacks...@googlegroups.com, Sarah Kingston
Hi Sarah and Everyone,

I have fixed this bug in the code base: the problem occurred when Stacks files existed for an individual, but the *.matches.tsv file was empty, causing a segfault. The populations program now properly ignores non-existing samples, whether they are found in the population map file, or as empty files on the disk. You can generate a set of empty files for a sample if there are roughly no reads for the individual coming out of process_radtags. No stacks will ever form, nor any matches to the catalog, etc. So files will be generated, but they will all be empty.

This fix will be in the next release, due out very soon.


Best,

julian

Sarah Kingston wrote:
Sorry for the flurry of emails - I actually reorganized my working files and now the v 1.22 finds the files in the populations command but I run into another segmentation fault, core dump error

populations -b 1 -P mariposa/converted -M popmap_Lobfinal.txt -s -m 2 -t 1 -a 0.01 -f p_value --fstats --vcf --genepop --plink

Fst kernel smoothing: off

Bootstrap resampling: off

Percent samples limit per population: 0

Locus Population limit: 1

Minimum stack depth: 1

Log liklihood filtering: off; threshold: 0

Minor allele frequency cutoff: 0.01

Applying Fst correction: P-value correction.

Parsing population map.

Found 143 input file(s).

  5 populations found [. . .]

  Parsing stacks/sample_TB13.matches.tsv

  Parsing stacks/sample_TB14.matches.tsv

Warning: unable to find any matches in file 'sample_TB14', excluding this sample from population analysis.

  Parsing stacks/sample_TB16.matches.tsv

Warning: unable to find any matches in file 'sample_TB16', excluding this sample from population analysis.

  Parsing stacks/sample_TB19.matches.tsv

Warning: unable to find any matches in file 'sample_TB19', excluding this sample from population analysis.

  Parsing stacks/sample_TB22.matches.tsv

  Parsing stacks/sample_TB23.matches.tsv

Warning: unable to find any matches in file 'sample_TB23', excluding this sample from population analysis.

  Parsing stacks/sample_TB26.matches.tsv

Warning: unable to find any matches in file 'sample_TB26', excluding this sample from population analysis.

  Parsing stacks/sample_TB27.matches.tsv

  Parsing stacks/sample_TB29.matches.tsv

  Parsing stacks/sample_TB30.matches.tsv

LR

unread,
Jan 15, 2016, 4:05:19 PM1/15/16
to Stacks, kingsto...@gmail.com, jcat...@illinois.edu
Hi,

I've generated the *.matches.tsv files through denovo_map.pl in stacks1.29. I was getting this same error when running "populations", which is that it was unable to find the *.matches.tsv files because it was looking for *.matches.tsv.gz which were not generated by denovo_map.pl

After reading this post, I realized I was running an older version of stacks so I downloaded and installed the new version on our cluster and then tried to use the new version of "populations" to analyze the output generated with denovo_map.pl (from the 1.29 version). However, I still get the same error. I have checked and the *.matches.tsv files are not empty. Is there a way to run populations with the previous *.matches.tsv files because it took about 3 days to run and would prefer to not rerun it in 1.35 just for a preliminary look. I will use the new pipeline when I implement new runs, but right now I'd like to process the data I have if possible. 

Thanks.

Linda

Julian Catchen

unread,
Jan 16, 2016, 9:03:03 AM1/16/16
to LR, Stacks
Hi Linda,

Please post the command you tried to run along with the location of your
files. You should have four files per sample in your directory (tags,
alleles, snps, matches), along with the catalog. The files can either be
all zipped or all not zipped. That particular error message is generic
and does not say anything about whether your files were zipped or not.

You can run new versions of Stacks against older pipeline runs without
issue. This only matters when the file format changes, which hasn't
happened in a while.

julian

LR

unread,
Jan 17, 2016, 9:07:29 AM1/17/16
to Stacks, lyr...@gmail.com, jcat...@illinois.edu
Hi Julian,

The .gz extension was absent from all the *.matches.tsv files in the output from the denovo_map.pl run. Location of my files (including POPid) is in /RADseq/stacks

slurm script below:

cd /stacks-1.35/bin

./populations -b 1 -P /lr9/RADseq/stacks/ -M /lr9/RADseq/stacks/PopFileLib1_6.txt -r 1 -m 5 -s -t 24 --fstats --bootstrap_fst --fasta --vcf --genepop --structure --write_single_snp --beagle -W ./wl_1000

Thanks.

Linda

Julian Catchen

unread,
Jan 17, 2016, 9:30:37 AM1/17/16
to LR, Stacks
Hi Linda,

I would run your script outside of the job scheduler (e.g. by hand on
the head node) to make sure your file paths are being found properly.
Just run it long enough to see that populations found the files and then
kill it.

What is the output of:

ls /lr9/RADseq/stacks/

julian

LR

unread,
Jan 19, 2016, 3:15:33 PM1/19/16
to Stacks, lyr...@gmail.com, jcat...@illinois.edu
Hi Julian, 

The paths should be fine because it runs and produces files (*.alleles.tsv, *.snps.tsv, *.tags.tsv, *.matches.tsv) and also produces batch_1.catalog.alleles(and .snps/.tags/.markers.tsv) and also a *population.log (although I did not specify the populations file when running the denovo_map.pl so that populations file is not useful for my purposes. 

Thanks.

Linda
Message has been deleted

Julian Catchen

unread,
Feb 19, 2016, 11:25:29 AM2/19/16
to stacks...@googlegroups.com, marco....@gmail.com
Hi Marco,

What is the output when you run populations, what are the contents of
your stacks directory, and what are the contents of your population map
file?

Best,

julian

marco....@gmail.com wrote:
> Dear all,
>
>
> I am having similar problems. When I try to run "populations" it says it
> can't find the matches.tsv.gz files. The files are there and are not
> empty. I am running stacks 1.27 on a cluster, and on my home directory I
> have a folder with stacks file, called stacks, and the population map.
> The command I write is:
>
> populations -P ./stacks/ -M ./pop_map_1.txt -b 1 -t 4 -m 5 -p 28 --phylip
>
> I used the same command with other data sets months ago and it worked
> fine. Again, the matches files are in the stacks folder and are not empty.
> This dataset was given to me by my supervisor and he started processing
> it with an older version of stacks, don't know which one, but got only
> as far as process_radtags. After that I used version 1.27. I cannot use
> the denovo script on my cluster so I have to run ustacks, cstacks and
> sstacks separately. Does anyone have any idea?
>
> Thanks,
>
> Marco
>
Reply all
Reply to author
Forward
0 new messages