Preprocessing of BrainSpan dataset

Surati Kumari

unread,

Apr 8, 2026, 5:42:49 AMApr 8

to FUMA GWAS users

Hi,

I was trying to replicate gene-property analysis done by FUMA using Brainspan dataset in local system. Thus I downloaded "RNA-Seq Gencode v10 summarized to genes" from BrainSpan Atlas (genes_matrix_csv.zip containing expression_matrix, column_metadata and row_metadata).
I tried to follow the pre-processing steps from FUMA :

"Primary gene ID was Ensemble ID. In total, 524 samples were available. General developmental stages were annotated for each sample based on the age. We used 11 developmental stages and 29 ages as the label. For the label of age, we excluded age groups with <3 samples (25 pcw and 35 pcw). From 52,376 annotated genes, genes were filtered on such that average RPKM per label is >1 in at least one of the either developmental stage or age. This resulted in 19,601 and 21,001 genes for developmental stages and age groups, respectively. RPKM was winsorized at 50 (replaced RPKM>50 with 50). Then average of log transformed RPKM with pseudocount 1 (log2(RPKM+1)) per label (for either 11 developmental stages or 29 age groups) was used as the covariates conditioning on the average across all the labels."

I have attached the code that I used for pre-processing BrainSpan dataset. However, results after running gene-property analysis with magma were different from FUMA gene-property analysis. I believe I have made some error in pre-processing (Please ignore the pasting of "S" alongside column number in expression matrix header).

Could you please help out here? It will be really grateful. Thank you in advance for you time and consideration.

Just a suggestion, it will be helpful if alongside other files such as preprocessed celltype datasets, the Brainspan dataset will also be there in Download section.

Thank you & kind regards,
Surati

brainspan_magma_covar.R

Tanya Phung

unread,

Apr 8, 2026, 10:56:10 AMApr 8

to FUMA GWAS users

Hi Surati,

Unfortunately I won't be able to check what the issue is in replicating the preprocessing of the Brainspan data.

However, I can make the preprocessed Brainspan data that FUMA uses to be downloadable. It will be announced when it is available to be downloaded.

Best,
Tanya

Message has been deleted

Surati Kumari

unread,

Apr 9, 2026, 2:17:39 PMApr 9

to FUMA GWAS users

Hi Tanya,

Thanks for responding. I understand, however would it possible to find which version of Brainspan data was used for FUMA internally. On https://www.brainspan.org there are two datasets, v10 which I used locally, and v3 (which has been superseded now). Maybe that'll help me out.

Thanks & kind regards

Tanya Phung

unread,

Apr 9, 2026, 4:24:04 PMApr 9

to FUMA GWAS users

Hi Surati,

Unfortunately I don't know. The data was processed years ago by a colleague who is no longer with the project. I have plans to reproduce these "reference files" soon, so stay tuned.

Best,
Tanya

Reply all

Reply to author

Forward