I’m trying to run parameter estimation (max likelihood) on a 3-population folded multi-dimensional SFS in fastsimcoal2 (fsc28) on Windows, but I keep hitting a parsing/consistency error before inference starts.
Context
OS: Windows
Version: fsc28 (fastsimcoal2)
Data: folded 3D multiSFS (“FREQ” format) for 3 demes with sample sizes 4,4,4 (125 entries in the SFS)
Objective: compare demographic models (strict divergence vs gene flow vs hybrid origin) using likelihood/AIC.
Observed SFS header (first two lines)
"1 observations. No. of demes and sample sizes are on next line."
"3 4 4 4"
The SFS entries are numeric (not necessarily integers) and sum to 39 in my current file (projection/scaling may be involved in how the SFS was generated).
Problem
When I run fsc28 with:
-t H3_hybrid.tpl
-e H3_hybrid.est
--multiSFS -M -m -u -n 50 -L 40
from within the directory that contains H3_hybrid.tpl, H3_hybrid.est, and H3_hybrid_MSFS.obs, the program exits with:
“Number of listed linkage blocks (1) does not correspond to that mentioned in input file (2270). Check your input file.
Exiting program… !
Unable to read file H3_hybrid.tpl
Unable to read input files. Exit program.”
What I have checked/tried
The template is internally consistent: it declares 1 contiguous linkage block and lists exactly 1 FREQ line.
I verified in R that:
number of lines starting with “FREQ” in the tpl = 1
the number after “//Per chromosome: Number of contiguous linkage Block” = 1
I previously tried setting the number of contiguous linkage blocks to the “mentioned in input file” value (e.g., 866), and listing that many FREQ lines; in subsequent runs the “mentioned in input file” value changed (e.g., 130, 866, 2002, 2270), which made it hard to converge on a stable configuration.
I also suspect Windows path/quoting issues due to spaces in directories, so I now run from the model directory and pass relative filenames (so the tpl/est paths should not be the issue).
I confirmed the MSFS has 125 numeric entries (scan() from line 3 onward in R), and the header indicates 1 observation and sample sizes 4 4 4.
Question
Could you advise what fastsimcoal2 expects for:
“Number of contiguous linkage Block” and “number of loci” in the FREQ line,
when using a folded multiSFS .obs where the entries are not raw integer SNP counts (e.g., if the SFS is normalized or scaled)?
Specifically:
Does fastsimcoal require the SFS bins to be integer counts?
If the SFS is scaled/normalized, how should the “number of loci” in the FREQ line be specified so that the parser doesn’t infer an inconsistent “mentioned in input file” linkage block count?
Is there a recommended way to indicate the observed .obs filename explicitly (e.g., via .par) for multiSFS estimation, and what is the exact expected .par format for this use case?
If useful, I can share the full .tpl/.est/.obs files (or paste the last section of the tpl containing the loci/block definitions) and the exact command line I’m running.
Thank you for any guidance,
Barsa Das
PhD student,
Museum and Institute of Zoology, Polish Academy of Sciences