Timeseries Data Collection as a Reusable App

126 views
Skip to first unread message

RLange

unread,
Mar 11, 2014, 11:42:24 AM3/11/14
to django...@googlegroups.com
I'm currently working on an app for browsing and visualizing time-series data. Each point of time-series data may be a mix of Strings, Floats, and Ints. In my current design, I have a separate model for each of my data types, and I have been writing a new view for each one. In other words, my app is strongly coupled to the structure of the models. Ideally, my app would use some sort of generic abstraction of a time-series model such that adding new types of data is a simple settings.py configuration, and the views for browse/visualize would be free.

I see a few possible avenues to accomplish this. None seems clearly better than any other. My database is MySQL. Any feedback is helpful!
  1. Use django-mutant (or equivalent) to make new models on the fly. However I would really only need to make models at initialization time, not at runtime.
  2. Instead of a column for each dimension of the data, use a Blob or Text type for headers and one for data (a hack for variable-length data. makes intelligent queries nearly impossible)
  3. Forget about pluggability and continue developing under the assumption that this app will never leave "in-house"

This is a deliberately vague question since it is about a vague topic: abstraction. I can provide more details about my specific use case, but that seems to defeat the purpose of writing a generic app.

Thank you,
RLange

Jason Arnst-Goodrich

unread,
Mar 11, 2014, 12:52:56 PM3/11/14
to django...@googlegroups.com
Without knowing much about your specifics, I would suggest looking into MongoDB. You'll have some of the issues in #2 and you obviously won't have a generic app that people without Mongo can use but I'd at least look into it before you go any further.

C. Kirby

unread,
Mar 11, 2014, 1:45:19 PM3/11/14
to django...@googlegroups.com
Like number 2 but a little more amenable to searching https://github.com/bradjasper/django-jsonfield

Russell Keith-Magee

unread,
Mar 11, 2014, 6:33:54 PM3/11/14
to Django Users
There's actually a fourth option - Continue to write project specific models, but make the visualisation layer generic. 

This is what Django's admin does. The admin is able to build a generic administration interface for *any* model, given nothing more than the name of the model. If you provide some additional configuration (like which columns you think are important), it can improve that visualisation.

Admin does this by using Django's introspection capabilities. Every Django model knows what it is called, what fields it has, what relations it has with other models, and so on. This information is all contained in the _meta attribute of every class or instance. ModelForms use the same capability to determine how to display a form for any model.

Back to your example: I imagine you would build a time-series model, and then register it with your visualizer. The visualizer might need to be told details like which field is the "time", and what units that time is in. But from there, it should be possible to find all the "data" columns on the model (or, again, discover them by introspection), and generate a visualisation for them.

A very bare-bones implementation of what you're looking for can be found in Django's databrowse app. Once upon a time, this was shipped as part of Django's contrib packages, but as of Django 1.4 it was split out into a standalone package, and is now being independently maintained:


I hope this helps!

Yours,
Russ Magee %-)

Jonathan Morgan

unread,
Mar 12, 2014, 9:44:48 AM3/12/14
to django...@googlegroups.com
On the model side, to facilitate a generic visualization layer, you could also consider an abstract parent class where you standardize time series information that isn't the data.

For my time series data, I have this:


I built a few django time-series models, then started to abstract out the parts that were the same across the models.  This includes:

- a Time_Period model to hold a set of defined time periods, if your time periods are more complex than simply time increments (for example, I did time-series of reddit activity within subreddits, broken out by hour, and also categorized into being within 14 days before and after a certain date - so before-1, before-2, before-3, ..., after-1, after-2, after-3, ...).

- an AbstractTimeSeriesDataModel that includes basic information on a time period (start and end datetime, time period index, a separate aggregate index in case you have something like my before and after, and so have after-1, which is actually number 327 overall, etc.), plus the ability to set and hold ten different generic filters (so you can mark a given time period as having contained links to news, for example, using filter_1, and store a count of matches within the period in match_count_1, or use filter_2 and match_count_2 to note a particular time-series period had X posts related to a particular topic), and base methods for doing a lookup and retrieving an instance.

Then, for a particular data set, you extend this abstract class and add in columns you want to track within a given time period (see https://github.com/jonathanmorgan/reddit_data for an example - the model Subreddit_Time_Series_Data is a much more complete example than Domain_Time_Series_Data).

An abstract base class like this would give you the common elements of time-series data on which you could start to build a generic visualizer, and also then give you the flexibility to add data as needed.  You could probably just use a simple naming convention to reveal added columns as data ("tsdata_").

You could also probably add generic data fields to this abstract model, but it didn't seem to be useful for me when I built this, since I was capturing many fields per time series, and I kept having to add fields as our research led to more questions.

Also worth noting, if you go with a relational database (I personally like relational databases for complex, relational data) and might have millions of rows, postgresql is much easier to get reasonable performance out of than mysql.

On Tuesday, March 11, 2014 10:42:24 AM UTC-5, RLange wrote:

RLange

unread,
Mar 14, 2014, 8:34:29 AM3/14/14
to django...@googlegroups.com
Thank you all for the great input.

Russell, your design I think will make the best framework - after all, defining and registering a model is hardly more difficult than defining each model in the settings. Plus, I'm realizing now that each type of data may come with its own particular functions, so it makes sense to group related models together into an app. It is indeed looking more and more like a clone of the admin app in terms of how it pulls together and abstracts data from other apps.

Jonathan, I'm certainly going to take inspiration from your time-series app, but I don't think I can use it for this particular application - my data points are much simpler (each of my data points is marked with only a single datetime, and that suffices). I'll look into postgresql as well, although again my models are fairly simple and not relying too heavily on relational fields.

Regards,
RL
Reply all
Reply to author
Forward
0 new messages