Hello!
I have made custom data set of the UniBind TF binding sites from prostate samples (120 different files, regions sorted by chromosome and start coordinate, no overlapping regions, all bgzipped). I'm using the singularity version of Giggle (v0.6.3) to test enrichment of reproducible ATAC peaks from my set of samples to these TF binding sites.
Building the index went seemingly fine: There were no errors or warning messages and it printed out how many intervals were indexed. However, while running searches, most of them failed due to segfaults. When I ran the search commands in for loop without -v, I didn't get even the error messages in the screen or the output file, but just empty files. Then I added -v and even the searches which did yield results (albeit clearly incorrect ones as compared to bedtools jaccard results), produced
file cannot be found
errors for some of the db bed files. I tested many things like the sort order of the files and apptainer exec giggle or giggle.sh search -C ${PD}/config.ini at the start of the search untill I ran everything from the scratch again in a node with more memory available than in my original system and now I get results for everything. In hindsight, I should have been more suspicious of the index being at fault when the results were not generated for all the files right away.
tl;dr: When making index with a system with too little memory, it doesn't necessarily stop to an error. I'm not sure if that can be called a bug, but it's definitely not ideal. What if all the searches would have gone through with seemingly no problem and I would have thought the results are what I got?
Now I'm wondering: Is there a way to test validity of the giggle index?
Also, there's still some minor discrepancy between the giggle result from bedtools jaccard for some query/ db file pairs, in that giggle finds more overlaps. Should I be worried about those? Is the number of overlaps expected to be exactly the same?
BW,
-Konsta