--
When the strand value is "-", position coordinates are listed in terms of the reverse-complemented sequence.I assumed it means the start and end positions are relative to the end of the positive strand. so if I want the aligned sequence I should look at reverse complement of the sequence in (tlen - tend, tlen - tstart] in positive strand.
Hello Alireza,
You are absolutely right. When looking at a chain file, the coordinates for things on the - strand are changed around. To find the item on the + strand, you will need to use the coordinates (tSize-tEnd, tSize-tStart]. When using a GTF file, you do not need to change the coordinates for items on the - strand.
It is unfortunate that differences like this exist, which can be confusing for people who need to work with many file formats. That is part of why we try to provide tools for working with these file formats and converting between them, so that these differences are hidden away.
A word of warning: you may also need to work with the PSL data format (http://genome.ucsc.edu/FAQ/FAQformat.html#format2) at some point. PSL has its own way of describing items on the - strand. The start and stop position for each alignment use + strand coordinates, but the list of start positions of the blocks within each alignment uses - strand coordinates. This can be confusing.
I hope this is helpful. If you have any further questions, please reply to gen...@soe.ucsc.edu or genome...@soe.ucsc.edu. Questions sent to those addresses will be archived in publicly-accessible forums for the benefit of other users. If your question contains sensitive data, you may send it instead to genom...@soe.ucsc.edu.
--
Jonathan Casper
UCSC Genome Bioinformatics Group
--