> could you post a script or two showing how you might work with this
> schema in practice? Thanks! Uncommented test scripts are just
> fine :)
I suggest you have a look at pygr/apps/ucsc_ensembl_annot.py in my
'ucsc_ensembl' GitHub branch. Note that I keep changing this script on
a regular basis; that said, in this post I shall use
http://github.com/mkszuba/pygr/commit/b50f1b8b792c58180cac10d778ba22ab263272d0
as reference. Don't mind the ugly code, it is as I mentioned earlier a
prototype and will be reorganised before the release.
> > One question: shall we hide from the user that protein data (s)he
> > gets is actually transcript data (i.e come up with some sort of a
> > wrapper) or just let him/her have it as it is?
> I don't understand this question... is the problem that the
> "transcript IDs" would actually be protein IDs? If only for
> traceability, I would suggest a wrapper.
See lines 137-140 in my script, which produce the following output:
ENSP00000372525 annotENST00000383052[0:47226] chrY[2863321:2910547]
What I meant by hiding from the user was creating a wrapper that would
use data from the transcript database but with protein identifiers;
right now protein-to-transcript-ID mapping is done explicitly and the
database is still just a transcript database.
--
MS