Hi!
I'm trying to use you tool on a small number of samples just to see how it works (3 ctr vs 3 treated samples) and when I do the parameter fitting I get the following error, here's the command and the error message:
spit fit_parameters -i counts4SPIT.tsv -m tx2gene.tsv -l pheno.tsv --n_small 3 -O prova_SPIT/ -s 3
--Simulating Experiment No 1
Traceback (most recent call last):
File "/home/merlino/.local/bin/spit", line 8, in <module>
sys.exit(main())
File "/home/merlino/.local/lib/python3.10/site-packages/spit/run_spit.py", line 176, in main
args.func(args)
File "/home/merlino/.local/lib/python3.10/site-packages/spit/run_spit.py", line 90, in handle_param_fit
simulate_exp(args)
File "/home/merlino/.local/lib/python3.10/site-packages/spit/parameter_fitting/simulate_dtu_exp.py", line 127, in main
dtu_genes = select_dtu_genes(IFs, ctrl_samples, gene_names)
File "/home/merlino/.local/lib/python3.10/site-packages/spit/parameter_fitting/simulate_dtu_exp.py", line 13, in select_dtu_genes
pot_dtu_genes = random.sample(gene_names, 1000)
File "/usr/lib/python3.10/random.py", line 482, in sample
raise ValueError("Sample larger than population or is negative")
ValueError: Sample larger than population or is negative
Looks like a problem of number of columns / samples but I think I have the three files correctly created:
- counts4SPIT is the output of tximport with the option "dtuTPMscaled". It has 7 columns: tx_id and the 6 samples names;
- tx2gene is the first two columns of the RSEM files from the quantification step (transcript and gene columns);
- pheno.tsv is the tsv table with the "id" and "condition" columns, with the ids matching the ones in the count4SPIT file.
Maybe I'm missing something, let me know!