Good morning, I'm working with the structural variants data from PCAWG and I'd like to classify by size, mainly the duplications. I have a couple of questions:
1. How do I compute the size of the duplications? In most cases start and end coordinates are the same, and then there's an alt coordinate. What is this alt exacly? Is the size of the duplication the distance from start to alt?
chrstartendreferencealtgenealtGeneeffect
chr12725675627256756G]1:28028201]GNUDCintergenic regionDUP
2. I've observed that the column gene is never "intergenic region". Are these duplications filtered or restricted to coding genes?
Thanks a lot!
Joan