- ask Python about the solution for shelve problems. What is the
official "long-term solution"?
- test sqlgraph with sqlite3 to ensure we support that platform
Improved sequence database
- we need to finalize the proposed SequenceDB API (see the wiki page),
then merge the best ideas from the seqdb2 prototype, the
FileDBSequence implementation, and implement them under the new API
- we need to propose an API for quantitative data bound to sequence,
either as another kind of sequence, or an annotation. We should start
from the core operations that one needs for using these kind of data
- Russell Meches proposed using netcdf for storing quantitative data,
and will do some performance tests for working with variable length
records
- SQLTable, SQLGraph should provide a standard way to control
ordering. This seems like a general recommendation; maybe all
container and mapping classes should have a consistent way to control
the order of iteration.
- we need to fill in some holes in sqlgraph to ensure that all the
kinds of containers and mappings Jenny needs for Ensembl can be
provided by standard components. Only cases that require a join
across multiple tables should require writing a custom class;
everything else should be handled by a standard component.
- we need a final resolution to the "Ensembl Assembly Version"
question. I would prefer to push hard for a definitive statement from
Ensembl about what genome standard file they are using in each case.
They *must* have that information internally, and it is simply
unacceptable that this information is not available programmatically.
Otherwise we're forced to do a full mapping of each Ensembl genome (to
the extent we can reconstruct it from whatever sources Ensembl
provides programmatically) onto the standard genome that everyone else
uses. This is feasible but really suboptimal -- it tends to impose a
heavy burden on us, and on users, who would have to use an extra
mapping layer to connect any results from Ensembl to any analysis done
with standard genomes (such as UCSC alignments).
- we need to write "Developer Guidelines" to mandate how developers
should get code and make their own code accessible, install and build,
file bug reports and track issues. The developer group discussion
list is great, but doesn't solve all problems. We need to establish
some consistency, or we'll waste time trying to figure out why
different developers run into problems that others can't reproduce.
- windows testing suggests that the problems are more in the test
setup itself rather than Pygr bugs, so we should try to solve these
and get the test suite running fully on Windows.
I'm going to send this right now, though there may be issues I
missed. Please add your thoughts.
-- Chris
Unpickling saved resources isn't working too well.
Whoops, pyrex stuff isn't installed on Jenny's computer :)
> Unpickling saved resources isn't working too well.
The problem had nothing to do with Pygr. But it was interesting, and
might be considered a failure of the Python pickle module to raise an
appropriate warning message. Jenny was trying to test her code using
doctests that are inserted directly in the module file (adaptor.py)
whose classes she was trying to test. She ran the tests by
python adaptor.py
But in this case, note that the module is never *imported*, and Python
assigns each class a __module__ attribute of '__main__' instead of the
actual module name 'adaptor'. Python pickling depends on the module
name for automatically re-importing the module that contains the
necessary class(es) during unpickling. Normally, the pickle module
performs a check that the class is actually found in the specified
module namespace, and raises an exception if not. However, in this
case it raised no error or warning message at all. And of course,
when you try to unpickle the object, it fails with a cryptic error
message, decipherable only by someone immersed in pickling methods.
I guess this is another example of a "bug" that's actually in the
testing setup, rather than in the code to be tested. It's a good
thing I didn't spend a bunch of time reading all her code over the web
to debug why her classes couldn't be pickled -- there never was
anything wrong with them! The only way to debug the problem was to
see exactly HOW she ran the test... which is different from how I
normally run tests, and thus never would have occurred to me.
Workarounds:
- to avoid this problem, write a separate script that imports the
module to be tested, and invokes the doctests on that module
- I added a check in pygr.Data's pickler subclass to catch this
situation and print an error message explaining what the user must do
to fix the problem. This addresses the fact that Python pickle fails
to give any kind of warning about this case. I also added a test to
the test suite to verify that our check detects this problem.
Further comments on whether the Python pickler should trap this as an
error, or at least provide a warning:
Strictly speaking, it's not *always* an error to pickle a class whose
__module__ is '__main__'. It's conceivable that the user will
guarantee that the class will already be loaded in __main__ on the
receiving side, in which case unpickling will succeed (you could argue
that it doesn't actually unpickle the class in this case; it just
finds it already present in memory). But note that this wierd usage
short-circuits the key feature of unpickling, i.e. that the unpickler
automatically finds and imports the right classes for you.
I'd guess that in 99% of real usage, this condition is simply an error
and will cause unpickling to fail, baffling the user. In the context
of pygr.Data (which is supposed to retrieve your data for you, without
you having to do anything else), this condition is *always* an error.
I think the pickle module should at least output a warning message
explaining the problem, which you could suppress by passing a
verbose=False argument. Or perhaps, it should (by default) raise an
exception, unless you explicitly set an option to permit this unusual
pickling scenario (e.g. allow__main__=True).
Titus, do you think it would make sense to pass this question on to
the Python folks? I haven't found discussion among the Python dev
people about this, although I see other people on the web running into
the same problem...
-- Chris