How is qFar value calculated in net file?

4 views
Skip to first unread message

Zhibo Ma

unread,
Jan 26, 2018, 6:54:55 PM1/26/18
to gen...@soe.ucsc.edu
Hi,
From the net file format page, it says that the qFar value represent how far a fill is from position predicted by parent if a fill that is on the same chromosome as parent. And I looked some net files, it seems not to be in number of bases.
Could someone help me explain how is it calculated? I know it could be figured out from the source code. But I am pretty new to programing, it’s difficult for me to figure it out myself.
I am going to use this qFar value to help me determine whether a rearrangement is local in my pair-wise alignment.
Really appreciate!
Zhibo

Zhibo Ma
Graduate Student
Department of Cellular Biology
University of Georgia


Matthew Speir

unread,
Feb 12, 2018, 4:22:27 PM2/12/18
to Zhibo Ma, gen...@soe.ucsc.edu
Hi Zhibo,

Thank your for your question about calculating the qFar value in Net files. I apologize for the delay in responding to your question. In short, qFar is the distance between the nearest edges of a child fill and its parent gap on the query chromosome, when the parent and child are on the same query chromosome but do not overlap.

This value is calculated by the netSyntenic program when a fill and its parent have the same query chromosome name using the code:

        int intersection = rangeIntersection(fStart, fEnd, pStart, pEnd);
        if (intersection > 0)
            {
        fill->qOver = intersection;
        fill->qFar = 0;
        }
        else
            {
        fill->qOver = 0;
        fill->qFar = -intersection;
        }


Here fStart/fEnd represent the fill start and end; pStart/pEnd represent the parent gap start and end.

If the fill is on a different chromosome from its parent, qFar is -1 (n/a). If this fill doesn't overlap the parent, then qFar is 0. If the fill is on the same query chromosome and overlaps its parent on the query chromosome, then qFar is the negative of the result of the rangeIntersection function:

    int  rangeIntersection(int start1, int end1, int start2, int end2)
    /* Return amount of bases two ranges intersect over, 0 or negative if no
     * intersection. */
    {
    int s = max(start1,start2);
    int e = min(end1,end2);
    return e-s;
    }


The end equation of rangeIntersection(fStart, fEnd, pStart, pEnd) is

    min(fEnd, pEnd) - max(fStart, pStart)

Examples of how this might be implemented:

Ex 1:

Parent gap start is 500, parent end is 1000. Fill start is 100, fill end is 200.

Putting this into the equation above:
min(200,1000) - max(100,500) = 200 - 500 = -300 (which is then negated --> qFar = 300)

Ex 2.
Parent gap start and are same as Ex 1. Fill start is 1100, fill end is 1200.

Again, putting these into the same equation, we get:
min(1200,1000) - max(1100,500) = 1000 - 1100 = -100 (which is then negated --> qFar = 100)

So, qFar should be the distance in bases, on the query chromosome, between the nearest edges of parent gap and child.

Here's also a handy visualization of some possible parent/child relationships and their qFar values:

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child1:    XXXX.....
--> qOverlap = 0, qFar = 5

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child2:         XXXX
--> qOverlap = 0, qFar = 0

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child3:           XXXX
--> qOverlap = 2, qFar = 0

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child4:             XXXX
--> qOverlap = 4, qFar = 0

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child5:                        XXXX
--> qOverlap = 4, qFar = 0

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child6:                         XXXX
--> qOverlap = 3, qFar = 0

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child7:                            XXXX
--> qOverlap = 0, qFar = 0

parent: XXXXXXXXXXXX---------------XXXXXXXXXXXXX
child8:                            ...XXXX
--> qOverlap = 0, qFar = 3

I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu. All messages sent to that address are archived on a publicly-accessible Google Groups forum. If your question includes sensitive data, you may send it instead to genom...@soe.ucsc.edu.

Matthew Speir
UCSC Genome Bioinformatics Group

Zhibo Ma

unread,
Feb 13, 2018, 11:45:44 AM2/13/18
to gen...@soe.ucsc.edu, Matthew Speir
Hi Matthew,
Really appreciate your explaination! That's very clear.
And thanks for all the work you guys have done!
Best,

Zhibo

Reply all
Reply to author
Forward
0 new messages