Hi,
I was trying to replicate gene-property analysis done by FUMA using Brainspan dataset in local system. Thus I downloaded "RNA-Seq Gencode v10 summarized to genes" from BrainSpan Atlas (genes_matrix_csv.zip containing expression_matrix, column_metadata and row_metadata).
I tried to follow the pre-processing steps from FUMA :
"Primary gene ID was Ensemble ID. In total, 524 samples were available. General developmental stages were annotated for each sample based on the age. We used 11 developmental stages and 29 ages as the label. For the label of age, we excluded age groups with <3 samples (25 pcw and 35 pcw). From 52,376 annotated genes, genes were filtered on such that average RPKM per label is >1 in at least one of the either developmental stage or age. This resulted in 19,601 and 21,001 genes for developmental stages and age groups, respectively. RPKM was winsorized at 50 (replaced RPKM>50 with 50). Then average of log transformed RPKM with pseudocount 1 (log2(RPKM+1)) per label (for either 11 developmental stages or 29 age groups) was used as the covariates conditioning on the average across all the labels."
I have attached the code that I used for pre-processing BrainSpan dataset. However, results after running gene-property analysis with magma were different from FUMA gene-property analysis. I believe I have made some error in pre-processing (Please ignore the pasting of "S" alongside column number in expression matrix header).
Could you please help out here? It will be really grateful. Thank you in advance for you time and consideration.
Just a suggestion, it will be helpful if alongside other files such as preprocessed celltype datasets, the Brainspan dataset will also be there in Download section.
Thank you & kind regards,
Surati