LiftOver Output

460 views
Skip to first unread message

tiziana

unread,
Jan 30, 2013, 6:50:35 PM1/30/13
to gen...@soe.ucsc.edu

Hi. I'm using the tool LiftOver to find mouse orthologs from a list of human lncRNA. I have some doubts about the output. I'm using a BED6 file in input and I obtain the corresponding BED6 file with the new coordinates in output, but the fifth column, reporting the size of the sequence, shows the size of the original human sequences. The strands, reported in six column, change but I'm perplexed about the results.

For some input sequences I used the same name: is it a problem?

Thank you in advance for the support.

Best regards,

                               Tiziana Sanavia

Brooke Rhead

unread,
Jan 30, 2013, 10:27:46 PM1/30/13
to tiziana, gen...@soe.ucsc.edu
Hi Tiziana,

I think I can explain what you are seeing in the liftOver tool.
Officially, the fifth column of a BED file is used for the "score,"
which is normally a number between 0 and 1000, assigned by the person
who created the file. See:

http://genome.ucsc.edu/FAQ/FAQformat.html#format1

The score column is basically ignored by liftOver. It carries the
information along to the output file, but it doesn't do anything to
recalculate it. If your BED file happens to contain a number with a
different meaning in the fifth column, liftOver will still just carry
the information along, leaving it unchanged.

Strand, on the other hand, does have meaning to liftOver. For instance,
if you give liftOver a region on the positive strand in human that
corresponds to a region on the negative strand in mouse, you will see
the strand column change in the output file.

Using the same name for two different input sequences isn't a problem
for liftOver per se: liftOver will proceed as it normally would if the
input regions had different names. However, the output will likely be
difficult for you to interpret, as you won't be able to tell which
output lines correspond to which input lines. (Actually, in your case
you might be able to distinguish output lines with the same name by
looking at your "score" column.)

I hope this information is helpful. If you have further questions,
please reply to gen...@soe.ucsc.edu.

--
Brooke Rhead
UCSC Genome Bioinformatics Group
> --
>
>
>
Reply all
Reply to author
Forward
0 new messages