populations program not reading catalog file

561 views
Skip to first unread message

Brian D.

unread,
Jul 30, 2018, 4:22:19 PM7/30/18
to Stacks
Hello,

I've search for this issue and only seen a brief mention without a solution so here goes. I'm using v. 2.1 on a pilot study with 5 samples of ddRAD data. I think everything is running fine until I run the populations program, which complains that the catalog file is not in fasta format: "Error: './min10reads_Pipeline/catalog.fa.gz': not in fasta format (expected '>')."

The file is a gzipped fasta file from previous steps that is correctly formatted - e.g. I'm able to open the decompressed file in alignment viewers. Unzipping the file for populations to read produces this error: 'Failed to open gzipped file './min10reads_Pipeline/catalog.fa.gz': No such file or directory.'

Obviously it wants a zipped fasta file but maybe the unzipping is not working correctly?

Thanks much for any help!

Brian D.

Nicolas Rochette

unread,
Jul 31, 2018, 3:24:00 PM7/31/18
to Stacks
Hi Brian,

Do you have a populations.log? If yes, could you post it?

Also, can you confirm that the output of

gzip -cd catalog.fa.gz | head -c1 | od -An -a

is ">" ?

Best,

Nicolas

Brian D.

unread,
Jul 31, 2018, 4:27:21 PM7/31/18
to Stacks
Hi Nicolas,

Attached is the log. The output of the one liner is:

bdorsey$ gzip -cd catalog.fa.gz | head -c1 | od -An -a
           >                                                           


After counting spaces and newlines in textwrangler that is:
\s{11}>\s{60}\n\n

Thanks for any advice!

Brian
populations.log

Nicolas Rochette

unread,
Jul 31, 2018, 4:49:17 PM7/31/18
to stacks...@googlegroups.com

Populations (and more generally Stacks when opening a gzipped fasta file) simply does this:

1. Try to the file (if it fails abort)
2. Peek at the first character in the file, check that it is '>' (if it's not, abort with the error you saw)

However at the moment it doesn't explicitly check that peeking at the first character succeeded (peeking may return -1 instead of 60/'>'). That is apparently the case you are in; ZLIB fails to read from the (open) file.

My guess would be that several versions of ZLIB coexist on your system and that your $LD_LIBRARY_PATH environment variable does not account for it.

Best,

Nicolas

--
Stacks website: http://catchenlab.life.illinois.edu/stacks/
---
You received this message because you are subscribed to the Google Groups "Stacks" group.
To unsubscribe from this group and stop receiving emails from it, send an email to stacks-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/stacks-users/56d43f85-0b64-4b66-978b-71acf63f7355%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Nicolas Rochette

unread,
Jul 31, 2018, 4:49:25 PM7/31/18
to stacks...@googlegroups.com

Populations (and more generally Stacks when opening a gzipped fasta file) simply does this:

1. Open the file (if it fails abort)


2. Peek at the first character in the file, check that it is '>' (if it's not, abort with the error you saw)

However at the moment it doesn't explicitly check that peeking at the first character succeeded (peeking may return -1 instead of 60/'>'). That is apparently the case you are in; ZLIB fails to read from the (open) file.

My guess would be that several versions of ZLIB coexist on your system and that your $LD_LIBRARY_PATH environment variable does not account for it.

Best,

Nicolas

On 7/31/18 3:27 PM, Brian D. wrote:

Brian D.

unread,
Jul 31, 2018, 5:57:28 PM7/31/18
to Stacks
Nicolas,

Thanks for the reply. However, I don't quite follow. Are you saying that stacks sees a '>' as the first character in the catalog file but for some reason ZLIB is not able to read the file? I thought zlib would have to open the file first before any checking of the characters could be done.

Or, is it true that the first character is a space and so stacks is aborting? If this is true, does that mean the catalog file is formatted incorrectly?

In any case, I don't see an $LD_LIBRARY_PATH environmental variable when I run 'env'. I am running Stacks on a Mac and from a google search it seems Macs might not use this variable. If there are multiple versions of zlib on my system could you recommend a solution? This gets a bit beyond my troubleshooting ability.

Thanks!

Brian

Nicolas Rochette

unread,
Jul 31, 2018, 6:48:02 PM7/31/18
to Stacks
The whitespace in the output of "od -An -a" is just filling, you can entirely ignore it, even newlines (newlines in the input would be "nl"). The first character of your file is indeed a >

However, Stacks succeeds in opening the file with ZLIB but not in reading anything from it, do it doesn't get to see that >

On OSX use DYLD_LIBRARY_PATH instead of LD_LIBRARY_PATH. You can check whether gzip and stacks are using the same ZLIB version with:

otool -L $(command -v gzip)
otool -L $(command -v populations)

Best,

Nicolas

Brian D.

unread,
Jul 31, 2018, 7:28:45 PM7/31/18
to Stacks
Thanks for clarifying. It seems that they are using the same version of zlib (libz.1.dylib v. 1.2.5 - see below). I assume if they are accessing the same version then the DYLD_LIBRARY_PATH variable is fine, no?

Any other suggestions for why zlib wouldn't be able to read the file?

Thanks very much,

Brian

sn01564:dioonPilot bdorsey$ otool -L $(command -v gzip)
/usr/bin/gzip:
    /usr/lib/libbz2.1.0.dylib (compatibility version 1.0.0, current version 1.0.5)
    /usr/lib/liblzma.5.dylib (compatibility version 6.0.0, current version 6.3.0)
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)
sn01564:dioonPilot bdorsey$ otool -L $(command -v populations)
/opt/local/NGS/bin/populations:
    /usr/lib/libz.1.dylib (compatibility version 1.0.0, current version 1.2.5)
    /usr/lib/libc++.1.dylib (compatibility version 1.0.0, current version 120.1.0)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1226.10.1)

Nicolas Rochette

unread,
Jul 31, 2018, 7:44:15 PM7/31/18
to Stacks

Before anything else, let's confirm that my guess about the failing reading is right. Could you replace Stacks v2.1 after replacing the stacks-2.1/src/gzFasta.cc file with the attached one? And retry the same populations command, see if the error message changed?

Nicolas

gzFasta.cc

Brian D.

unread,
Jul 31, 2018, 8:13:47 PM7/31/18
to Stacks
I rebuilt Stacks with the new gzFasta.cc file and ran the same populations command. I get the same error message.

Brian
populations.log

Brian D.

unread,
Aug 9, 2018, 7:00:44 PM8/9/18
to Stacks
Hello again,

I haven't seen anything new about this issue so I thought I would ask again. Any advice since the new gzFasta.cc file had no effect?

Thanks very much,
Brian D.

Nicolas Rochette

unread,
Aug 9, 2018, 10:07:51 PM8/9/18
to stacks...@googlegroups.com

Hi Brian,

Could you try again with this file, please? I've also enriched the error message itself, i.e. even if the same error happen it will print more info than it did before.

Best,

Nicolas

gzFasta.cc

Brian D.

unread,
Aug 13, 2018, 4:40:04 PM8/13/18
to Stacks
Hi Nicolas,

I finally had a chance to try this. Here is the error message I get now:

Error: './min10readsCutsites_Pipeline/catalog.fa.gz': not in fasta format (expected '>', got 0x2e '.').

I searched for a '.' at the beginning of a line in the catalog file but didn't find one:

sn01564:dioonPilot bdorsey$ gunzip -c ./min10readsCutsites_Pipeline/catalog.fa.gz | grep '^\.'
sn01564:dioonPilot bdorsey$

Any ideas?

Thanks again
Brian D.

Nicolas Rochette

unread,
Aug 14, 2018, 11:31:51 AM8/14/18
to Stacks

Hi Brian,

Thanks for coming back to us again. I'm glad we could pin down the origin for certain. This is really puzzling though.

The error happens immediately after the file is opened—then the gz-FASTA reader peeks at the first character to check if it is a '>'. This part of the code is really simple, trackable, all it does is:

gzFile gzf = gzopen(path, "rb");
if(!gzf) { ... (error: failed to open file) }
int first = gzgetc(gz_fh);
if(first != '>') { ... (error: not in fasta format) }

I'll see if I can come up with an idea for the cause. It works elsewhere so it somehow must be a system-specific issue, but I can't think of a specific one for the moment.

Best,

Nicolas

Nicolas Rochette

unread,
Aug 14, 2018, 11:36:30 AM8/14/18
to Stacks

(p.s.—in the snippet 'gz_fh' should read 'gzf'; naming in the actual code is consistent.)

Brian D.

unread,
Sep 5, 2018, 12:25:50 PM9/5/18
to Stacks
Hello again Nicolas,

I have a bit more information regarding this issue. I have tried the pipeline on my macbook and it works just fine. It is running the same operating system as my desktop box (OS 10.11.6) and the same perl version (5.22.0). Also, the error message suggests that, on the desktop machine, the gz-FASTA reader is including the full path of the fasta file in its check for a '>' character as the first one.

Here are two calls to populations and the associated error messages for catalog files in separate directories. Note that the character found is the first in the path provided to populations via the -M flag and not the first in the file itself.

/usr/local/bin/populations -P M1 -M ../info/pilot2Pops.txt -t 20
Error: 'M1/catalog.fa.gz': not in fasta format (expected '>', got 0x4d 'M').

populations -P ./min10readsCutsites_Pipeline -M ./pilot2Pops.txt -t 18
Error: './min10readsCutsites_Pipeline/catalog.fa.gz': not in fasta format (expected '>', got 0x2e '.').

I hope this is helpful in tracking down the problem. Please let me know if I can provide more info.

Cheers
Brian D.

Nicolas Rochette

unread,
Sep 5, 2018, 1:28:56 PM9/5/18
to Stacks

Hi Brian,

?!? very surprising...

But definitely helpful. I think it explains it and that we should be able to track down the issue. I keep you updated.

Cheers,

Nicolas

nicolef...@gmail.com

unread,
Feb 27, 2019, 11:12:54 AM2/27/19
to Stacks
Was this issue ever solved?? I am having the same problem as Brian-- catalog.fa.gz': not in fasta format (expected '>', got 0x2e '.').

-Nicole

Cinnamon Mittan

unread,
Oct 21, 2019, 12:28:35 PM10/21/19
to Stacks
Hi Brian and Nicole,

I am also having this issue using v2.3e. Is there a solution?

Thanks!

Cinnamon 

Nicolas Rochette

unread,
Oct 21, 2019, 1:52:52 PM10/21/19
to Cinnamon Mittan, stacks Users List

Hi Cinnamon.

v2.4 will print additional details when this error occurs.

As far as we know, the error comes from a problem in your system's configuration, particularly the fact that the immediately accessible Z library (zlib, the standard for reading .gz files) is older than the one that was used for compiling Stacks. Typically you will want to fix your LD_LIBRARY_PATH (e.g. by hand or by loading modules)

Best,

Nicolas

Reply all
Reply to author
Forward
0 new messages