Does the "bamToBed" program knows to properly handle reads with insertions or deletions in the CIGAR field ?
Meaning:
SAM files contains only the start position, the sequence itself and the CIGAR string.
The naive way to calculate the END position is to add the length of the sequence to the start coordinate.
But if the read was mapped with insertions or deletions (as indicated by the CIGAR field), the end coordinate will be incorrect.
Does the program takes the CIGAR into account ?
thanks,
-gordon
--
To unsubscribe, reply using "remove me" as the subject.
I will do this for the next release. The current release only obeys the M and N CIGAR operations.
A quick way to fix this if you need to would be to modify the ParseCigarBed() function to increment currPosition accordingly when I or D operations are encountered. One would change the main loop in that function to something like:
for (; cigItr != cigEnd; ++cigItr) {
switch (cigItr->type) {
case('M'): currPosition += cigItr->Length;
case('I'): currPosition += cigItr->Length;
case('D'): currPosition += cigItr->Length;
}
}
Sorry if I confused things...long day.
Aaron
--
You received this message because you are subscribed to the Google Groups "bedtools-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bedtools-discu...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
Currently, bamtobed splits on N (spliced alignments, e.g., RNA-seq) CIGAR ops only. Are you asking for it to split on both N and D ops?
Exactly. Though now that I looked into it, what is the real difference between N and D? Anyhow, some option (like "-d") to split on D would be useful.