sure - but why inventing the wheel a thousand times - a general API
On 15 Apr., 09:59, "Aaron Krill" <aa...@krillr.com> wrote:
> If you want your own export filter, simply write it into your application.
> It isn't hard to do.
with open filters would bundle our powers - and also makes the filters
more bug free as anyone developing filters on their own
I didn't mean base64-encoded data itself is hard to handle - I meant a
> All you would have to do to handle the binary data is parse the XML tree,
> grab the base64 and decode it. Simple. How is this difficult? This is the
> standard for any XML-based document format that must contend with binary
> data (SVG containing scalar graphics comes to mind). Also ODF and the like
> which may contain similar image data.
single XML file with X Gigabyte base64-encoded data in it isn't easy
so how would you translate the references to other models into JSON?
> I don't see why JSON would not be suitable for a large export -- the format
> is small and easy to understand. It's also far easier to import and export
> from other python applications. In some cases, parsing a JSON "packet" is
> considerably faster than parsing XML as well.
Just by throwing out the key name as a string? The nice thing about
XML with a schema is, that it is self-describing - the nice thing
about a binary format would be that the references could be modeled as
pointers - just the key name as a string now seems messy to me - but
as I said before: for small exports this might be the right thing to
On Tue, 2008-04-15 at 10:19 -0700, Frank wrote:
> Other formats might be interesting, but I think you can't avoid XML.
> if anything it has to be at least XML.
> now this is true that datastores with lots of blobs might then export
> huge XML files if you include them as base64 encoded elements.
> So why not encapsulate everything in a zip file instead, containing an
> xml file per object class, but including the binary fields as files
> named using their entity key.
> those files could be organized in nested folders using their class
> name and property name:
Something like this would also be my choice. I use a similar pattern for
the import/export data of objects stored on zodb but I use config.ini
and contents.csv instead of xml. I find it useful because clients can
also easily browse and read these formats.
Are you chewing a lot of cpu with your communications? I'd expect
reasonably similar levels of cpu usage between json and django
I've been hoping for that to happen for a while. Both GWT and Flex
have pretty good support for JSON.
> > Are you chewing a lot of cpu with your communications? I'd expect
> > reasonably similar levels of cpu usage between json and django
> > templating, honestly.
> I don't have anything up and running yet in Python but taking the case
> of a client grid paging forward 100 rows at a time, that would equate
> to 100 entities serialized to json on the server each request. My
> hunch is that a 5 or 10 times gain in serialization efficiency via a C
> function call would have a material effect on the CPU cycles total
> that clock up against a App Engine application allowance during a busy
> In the case of a 5000 entity bulk data exchange I think we would then
> be talking about a measurable response time difference on a human
> perceived time scale.
Yup, I agree. On the flip side, having done a touch of HCI, I know
there are plenty of ways of hiding that time impact, or at least
allowing the user to continue doing other things while waiting for
data to load.
Don't get me wrong, I like some of the concepts of rdf. Hell, the
browser I use is based very heavily on an rdf engine. But, as a data
export format, I have serious questions about the applicability of
rdf. It's verbosity is high, and the tools that can deal with it are
few and far between.
Brett Morgan http://brett.morgan.googlepages.com/
Yeah, I was following Jena for a while. It just fails the
"understandable by an engineer in a five minute elevator pitch" test.
A large chunk of the problem is that there isn't a nice way to
linearise a data graph in a heirarchical storage medium like xml. At
the very least you wind up with references that are across the
heirarchy. Yes Sparql is designed to deal with all of this, and yes
I've been waiting for an RDF datastore to hit it big.
But RDF/Sparql are competing with the easy understandability of a
persistent hash map like we are seeing in google's BigTable, or
CouchDB, or AWS's SimpleDB. These are readily understandable because
they extend metaphors (a hashmap) that every programmer has already
come to terms with.
RDF datastores are, in my experience, hard to reason about. Sparql has
a long way to go in optimisation strategies, and more over, in being
at a place where an engineer can get a reasonable handle on whether a
given Sparql query will run quickly or slowly. This is an important
consideration. We are currently watching a bunch of engineers coming
to terms with DataStore's performance characteristics, and DataStore
is a lot more predictable than most RDF stores I've played with...
One of the main things that we are gently encouraged to do with GAE is
move processing to write time. This means you nee to know ahead of
time everything you want to do.
This means we are trading away the power of a flexible query engine,
but in return we get quick page renders. I suppose the question
becomes, is this trade off appropriate for the application you want to
> Thanks for the interesting discussion Brett.
My pleasure =)
Brett Morgan http://brett.morgan.googlepages.com/