Dear Cath, dear Galt,
thank you for your answers. I'll reply to each point you highlighted:
On 08/21/2017 08:38 PM, Cath Tyner wrote:
> let me know how you are building the DAS url?
I connected to the UCSC via MySQL with the command:
mysql --user=genome --host=
genome-mysql.cse.ucsc.edu -A -D hg19
and built the DAS URL from the data I obtain by executing the MySQL query:
select g.chrom, g.txStart, g.cdsStart from refGene g, knownToRefSeq r
where
g.name = 'NM_005504' AND r.value =
g.name;
which returns:
+-------+----------+----------+
| chrom | txStart | cdsStart |
+-------+----------+----------+
| chr12 | 24962957 | 24970941 |
+-------+----------+----------+
I then composed the DAS URL by assembling a string putting together:
- "
http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment="
- "chr12" (from the "chrom" field)
- ":"
- "24962957" +1 (from the txStart field)
- ","
- "24970941" (from the cdsStart field)
And obtain exactly the same DAS URL you wrote:
>
http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr12:24962958,24970941
Which I would indeed use to automate the retrieval of the sequence, and
avoid using the TableBrowser's web interface.
However, although the sequences obtained from the TableBrowser (the way
I described in my previous mail) and the DAS server are of the same
length, they are different in their content:
(length test)
wc -c sequence_from_das
7985
wc -c sequence_from_tablebrowser
7985
(content test)
cat sequence_from_das | cut -c1-50
aatcttccaactcaatttatttgtttcagctcacgaacttttcagaatcc
cat sequence_from_tablebrowser | cut -c1-50
atggaaaatagaggatacaatggaaaatagaggataccaactgtatgcta
> If you instead use the following DAS url:
>
http://genome.ucsc.edu/cgi-bin/das/hg19/dna?segment=chr12:24962958,24970> ...a then if you copy the sequence and blat
> <
http://genome.ucsc.edu/cgi-bin/hgBlat> it to the hg19 genome, you will
> find that it aligns perfectly with the Table Browser results.
Thank you, but using BLAT would mean using a web interface, and since I
have to automate the retrieval process, I can't walk this path. Sorry :/
I tried also to automate the retrieval of the sequence not by leveraging
on DAS, but directly from UCSC's "knownGeneMrna" table. Here I connect
to UCSC's MySQL interface with the usual:
mysql --user=genome --host=
genome-mysql.cse.ucsc.edu -A -D hg19
followed by this MySQL query:
select
g.name, g.chrom, g.strand, g.txStart, g.txEnd, g.cdsStart,
g.cdsEnd, m.seq from knownGene g, knownToRefSeq r, knownGeneMrna m where
r.value = 'NM_005504' AND
r.name =
g.name AND
r.name =
m.name;
which returns (I cut the sequence for brevity) :
+-----------------------------------------------------------------
-----------------------------+
| name | chrom | strand | txStart | txEnd | cdsStart |
cdsEnd | seq |
+-----------------------------------------------------------------
-----------------------------+
| uc001rgd.4 | chr12 | - | 24962957 | 25102393 | 24970941 |
25101866 | agtagggaggtg... |
+-----------------------------------------------------------------
-----------------------------+
here the retrieved sequence is 9683 nt. long, which I suspect it's
because I can't directly (at MySQL query-time) specify to return the
3'UTR only, but...
this time, if I grep the sequence retrieved from the TableBrowser's web
interface within the sequence I just obtained from the "knownGeneMrna"
table, I find it! :)
Or better, the sequence I retrieve from the TableBrowser's web interface
(shorter sequence) is contained in the sequence I retrieve from the
"knownGeneMrna" table (longer sequence) :
- longer sequence starts, and after 1689 nucleotides
- shorter sequence starts
- shorter and longer sequences are 100% identical
- shorter sequence ends
- longer sequence continues with 10 "a" nucleotides, then ends
To wrap up:
- I retrieve a sequence from the TableBrowser
- I want to automatize the process by not using any web interface, so I
try leverage on DAS, building a DAS URL with the coordinates I obtain
via MySQL query
- the sequences obtained from the DAS server and TableBrowser are equal
in size but different in content although the position is correct
- I try then to obtain the sequence from the "knownGeneMrna" table, and
the sequence I get back _contains_ the sequence I originally got from
the TableBrowser
- this DAS vs. TableBrowser sequence discrepancy is puzzling
- this result I get from the knownGeneMrna table contains the one I get
from the TableBrowser, but I don't get how come the two overlap after
1689 nucleotides
Since I can't use any web interface to automate the process, do you
think I should rely on the DAS server or the "knownGeneMrna" table?
If so, whoat would be the best way to have consistent results?
Am I missing something somewhere?
Thank you for your time.
Best regards