The genome build does seem to be at the root of the problem. But something is odd:
As I mentioned, my query stems from the same file (
GRCh37) being uploaded twice, but the first time, most RSIDs were missing - and I notice a slight differences in the Gene names mapped to the same CHR:POS:REF:ALT as well. The second time, all the RSIDs and gene names came out as expected. I've just now uploaded it again, but this time I intentionally MIS-selected
GRCh38 as the build, and the result ended up being exactly the same as the suspect first run (in terms of spotty RSIDs and the gene name assignments). But of course, in this newer submission the Manhattan plot says at the top: "
Build: GRCh38" since that is what I selected. But the original problematic run (which, again, contains exactly the same suspect RSIDs/gene names) says "Build: GRCh37" at the top. i.e. it says GRCh37 on the manhattan plot, but the data were clearly prepped/annotated using GRCh38.
Does the "Build: " label at the top of the Manhattan plot definitely reflect what the user selected on the upload Page?
is there somehow that a job's processing (which build to reference for RSID/gene mapping) could get mismatched from the user selection?
Again, the suspect run is GWAS 157370 from Jan 12. The ingest log looks unremarkable:
[ingest][2024-01-12T15:46:54+00:00] Performing upload step: Calculate SHA256
[success][2024-01-12T15:46:57+00:00] Step completed
[success] The GWAS file passed validation. Read the logs carefully, in case any specific lines failed to parse.
[ingest][2024-01-12T15:46:57+00:00] Performing upload step: Normalize GWAS file format
[success][2024-01-12T15:55:49+00:00] Step completed
[ingest][2024-01-12T15:55:50+00:00] Performing upload step: QQ plots and top hit detection
[success][2024-01-12T16:03:50+00:00] Step completed
[ingest][2024-01-12T16:03:50+00:00] Performing upload step: Prepare a manhattan plot
[success][2024-01-12T16:08:38+00:00] Step completed
Thanks for any info/thoughts.
Sorry to hear that my.locuszoom is being an orphan.