The SNPs are aligned in a line at the top in the Manhattan plot

27 views
Skip to first unread message

Shaogan Wang

unread,
Jun 12, 2026, 9:07:10 AM (yesterday) Jun 12
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hi everyone,
I want to perform GWAS on 239 maize inbred lines using 2.88 million clean-up SNPs in TASSEL5. I downloaded and installed tassel-5-standalone on Google Cloud Platform (Ubuntu 26 server). I run the following command lines:

run_pipeline.pl -Xmx10g \ -fork1 -plink -ped ./maize_tassel_clean.ped -map ./maize_tassel_clean.map \ -fork2 -r ./tassel_phe.txt \ -fork3 -r ./tassel_cov.txt \ -fork4 -kinship -method Centered_IBS -input1 \ -combine5 -input1 -input2 -input3 -input4 -intersect \ -mlm -mlmVarCompEst P3D -export tassel_Sac_gwas_output

When I received the output files, I plotted the data to generate Manhattan and Q-Q plots. I found that both plots were very weird, as shown in the figures. The SNPs are aligned in a line at the top in the Manhattan plot. Could you tell me where the problems are and how to fix them?  I really appreciate your kind help!
gwas_manhattan_plot.pnggwas_qq_plot.png 

Brandon Monier

unread,
Jun 12, 2026, 10:59:23 AM (yesterday) Jun 12
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hello,

Are you running any filtration steps before you run MLM to account for MAF and sites found in most of your 239 lines?

Best,
Brandon M.

Shaogan Wang

unread,
Jun 12, 2026, 11:22:07 AM (yesterday) Jun 12
to TASSEL - Trait Analysis by Association, Evolution and Linkage
Hello Brandon,

Thanks so much for your question. Actually, the 2.8 million SNPs were filtered from the raw dataset using the following command lines:
plink --allow-extra-chr\
  --biallelic-only strict\
  --double-id\
  --geno 0.05\
  --maf 0.05\
  --make-bed\
  --mind 0.05\
  --out maize239_plink_qc\
  --snps-only\
  --vcf all_chrs_239.vcf.gz
 

plink --bfile maize239_plink_qc \
  --indep-pairwise 50 10 0.2 \
  --out maize239

plink  --bfile maize239_plink_qc\
  --extract maize239.prune.in \
  --make-bed \
  --out maize239_pruned

Shaogan Wang

unread,
Jun 12, 2026, 11:26:36 AM (yesterday) Jun 12
to TASSEL - Trait Analysis by Association, Evolution and Linkage
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from bioinfokit import visuz

# 1. Target files
gwas_input_file = "tassel_Sac_gwas_output2.txt"
manhattan_output = "gwas_manhattan_plot"
qq_output = "gwas_qq_plot.png"

print(f"Loading GWAS statistics from: {gwas_input_file}...")
if not os.path.exists(gwas_input_file):
    print(f"Error: Could not find file '{gwas_input_file}' in this folder.")
    exit()

# Load the file
df = pd.read_csv(gwas_input_file, sep="\t")

# Filter out 'None' intercept AND the 1e-9 TASSEL mathematical ceiling
df_filtered = df[(df["Marker"] != "None") & (df["p"] != 1e-9)].copy()

# Enforce numerical formatting
df_filtered["Chr"] = pd.to_numeric(df_filtered["Chr"])
df_filtered["Pos"] = pd.to_numeric(df_filtered["Pos"])
df_filtered["p"] = pd.to_numeric(df_filtered["p"])
df_filtered = df_filtered.dropna(subset=["Marker", "Chr", "Pos", "p"])

print(f"Processed {len(df_filtered)} true SNPs (excluding 1e-9 artifacts).")

# 3. Draw Manhattan Plot using bioinfokit
print("Generating clean Manhattan Plot...")
visuz.marker.mhat(
    df=df_filtered,
    chr="Chr",
    pv="p",
    gwas_sign_line=True,
    gwasp=5e-8,
    figtype="png",
    figname=manhattan_output,
    dim=(12, 6),
    dotsize=5,
    valpha=0.7
)
print(f"Manhattan plot successfully saved: {manhattan_output}.png")

# 4. Generate Q-Q Plot manually (Fixed Sorting Alignment)
print("Generating Q-Q Plot manually...")
n_snps = len(df_filtered)

# Sort observed p-values descending (so -log10 values are ascending: 0 -> max)
p_observed = np.sort(df_filtered["p"].values)[::-1]
log_p_observed = -np.log10(p_observed)

# Compute expected uniform p-values in descending order (so -log10 values are ascending: 0 -> max)
p_expected = np.arange(n_snps, 0, -1) / (n_snps + 1)
log_p_expected = -np.log10(p_expected)

# Render the plot
plt.figure(figsize=(6, 6))
plt.scatter(log_p_expected, log_p_observed, c="crimson", s=5, alpha=0.6, zorder=2)

# Draw the 45-degree diagonal null hypothesis line
max_val = max(max(log_p_expected), max(log_p_observed))
plt.plot([0, max_val], [0, max_val], color="black", linestyle="--", linewidth=1.5, zorder=1)

# Formatting
plt.title("GWAS Quantile-Quantile (Q-Q) Plot", fontsize=12, fontweight="bold")
plt.xlabel(r"Expected $-\log_{10}(P)$", fontsize=10)
plt.ylabel(r"Observed $-\log_{10}(P)$", fontsize=10)
plt.grid(True, which="both", linestyle=":", alpha=0.5)
plt.tight_layout()

# Save the final rendering
plt.savefig(qq_output, dpi=300)
plt.close()
print(f"Q-Q plot successfully saved: {qq_output}")

These are the scripts I used to create Manhattan and Q-Q plots
On Friday, June 12, 2026 at 4:59:23 PM UTC+2 Brandon Monier wrote:

Atit Parajuli

unread,
Jun 12, 2026, 12:01:03 PM (yesterday) Jun 12
to tas...@googlegroups.com
How does the PC 1 vs PC2 plot looks like?

--
You received this message because you are subscribed to the Google Groups "TASSEL - Trait Analysis by Association, Evolution and Linkage" group.
To unsubscribe from this group and stop receiving emails from it, send an email to tassel+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/tassel/06a5166e-0067-4a9a-b6a7-5ec1aca7e5c5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages