Hi Samantha/Chris,
Ex-lab Barash member here :)
A) paired reads are counted twice, which is a feature, not a bug :). Usually, this isn't a problem because read lengths are 50-150 bp and inserts are way more than 300bp. But if you're concerned that your library fragment insert sizes are less than 2x the read lengths and you're seeing lots of overlapping paired reads, I guess you could try and do some kind of trimming of one of your mates, so that there isn't overlapping information. But AFAIK, MAJIQ doesn't do anything like that under the hood. I haven't personally come across a library with inserts that small, however, so I might have other concerns about the sample if the fragments are so short :/
B) if I understand correctly, you're wondering what happens if you have two fragments that have inserts with the same start, but when they were sequenced, one perhaps had low quality base calls at the start and the other fragment didn't have low quality base calls at the start. Then, the reads were trimmed or the aligner doesn't map their starts the same? In this case, yes, MAJIQ would see two distinct start pos. I'm not sure there is any way to determine that these two reads came from two fragments with inserts sharing the same start. I'm also not sure if this is a common scenario. Usually, reads have higher quality base calls at the start, and by the end of the read the quality goes down a bit. If there is a systematic low quality of base calls, I'd again be a bit worried about the underlying sample or sequencing run.
These are fun questions! Makes me miss the good ole days of drawing reads on the lab windows
Caleb Matthew Radens