Hi,
Thanks for responding. Though I have certain doubts. I am trying both ways, to run it via web server individually for different datasets and locally as well.
Issue in trial and error method :
For example, for one of the gwas file (trait A) I ran celltype per scRNA-seq dataset for step 1, 2 & 3 and could figure out the which datasets were throwing error. And since I am doing a comparative analysis I am supposed to chose similar scRNA-seq datasets for all my gwas files. And thus I ran celltype analysis for my other gwas file (trait B) with only those scRNA-seq datasets which successfully returned results for trait A. However, what was surprising for me was this job failed because of Error:Timeout. I thought maybe its because I'm choosing all three steps, so I ran same gwas file just for step 1 (Job ID : 707563), and it still failed with same Error:Timeout, but when I ran the same gwas file for all the scRNA-seq datasets (Brain) without excluding anything (Job ID : 707698) it produced results without any error. How can taking less number of celltype datasets take more time compared to larger set of celltype datasets?
Doubts while running locally :
At first I tried to run Step 1 via web server since step 1 was producing results without throwing any error when all Brain celltype dataset were selected, and then using those {ds}.gsa.out files to run step 2&3 locally. I was trying to run step 2&3 locally from step 1 outputs already generated by FUMA web-server. Since I already had {ds}.gsa.out files, I ran fuma_celltype_postStep1.R which returned magma_celltype_Step1.txt and magma_celltype_Step1_sig.txt for all the scRNA-seq dataset of interest altogether.
step1_outputs : had all {ds}.gsa.out files
magmadir : had magma executable
celltype_filtered : all the preprocessed scRNA-seq datasets of interest (excluding problematic datasets from trait A trials)
datasets_used.txt : a list of datasets used to create celltype_filtered
(ls step1_outputs/magma_celltype_*.gsa.out | sed 's/.*magma_celltype_//' | sed 's/.gsa.out//' > datasets_used.txt)
Now I am confused if I had to run fuma_celltype_postStep1.R for each region as listed in FUMA web-server e.g. Allocortex, Cerebellum, Hippocampus etc., since the flowchart mentions {per region} and then to run combine_step1_all_regions.py. Can't I use my fuma_celltype_postStep1.R outputs for create_parent_ds_input.py directly?
Also does this mean I have to keep processed scRNA-seq dataset in sub-directories as per FUMA web-server and not all celltype datasets in single directory like I did in celltype_filtered ?
i.e. am I supposed to do this?
base_dir/
└── TRAIT_A/
├── forebrain/ # {ds}.gsa.out for celltype datasets in forebrain
│ ├ magma_celltype_step1_sig.txt
│ └ celltype/*.txt # filtered pre-processed celltype datasets
├── thalamus/ # {ds}.gsa.out for celltype datasets in thalamus
│ ├ magma_celltype_step1_sig.txt
│ └ celltype/*.txt # filtered pre-processed celltype datasets
└── ...
Currently I have :
fuma_celltype/
├ step1_outputs/ # {ds}.gsa.out for all brain celltype datasets
├ celltype_filtered/ # filtered pre-processed celltype datasets
I would really appreciate a bit of guidance here. Thanks beforehand for the time and consideration.