RFC: Supporting partial alignments

5 views
Skip to first unread message

Reece Hart

unread,
Sep 10, 2019, 8:35:00 PM9/10/19
to hgvs-discuss
This is a request for comments on a technical issue with transcript-genome alignments.

As some of you may know, NCBI's transcript alignments sometimes do not cover the full span of the transcript, with gaps occurring on the 5' end, the 3' end, internal, or some combination. In most cases, hgvs refuses to use these transcripts. Limited evidence implicates that these unaligned regions result from difficult regions in the assembly. In GRCh37, fewer than 0.3% of transcripts (330/108645) have this issue. In GRCh38.p13, the problem rate is 0.04% (28/68408).

biocommons/uta#198 was written when I believed that this was a large problem and proposed to support partial alignments.

I now think that 1) this is a small problem, 2) the problem is with the underlying assembly, not UTA (and therefore the fix shouldn't be in UTA), and 3) fixing it in UTA will introduce significant new complexity that would be detrimental overall. 

I've provided data and more rationale in https://github.com/biocommons/uta/issues/198#issuecomment-530161207. I would appreciate comments.

-Reece
Reply all
Reply to author
Forward
0 new messages