Infer (and create) Schema from Example Object/Dictionary

42 views
Skip to first unread message

Gregg Lind

unread,
Dec 16, 2011, 1:42:50 PM12/16/11
to sqlal...@googlegroups.com
My search skills are failing me, and I hope you all can help.
(Apologies that there is some heresy here)

Assumptions:

1)  Suppose I have objects made from json (dicts of strings, lists of dicts, etc.)
2)  (for simplicity, assume these nestings don't go very deep)
3)  getting this right 90% of the time is fine, and it can assume there are only 1:1 and 1:many.

Question:

Is there a tool, or what is the easiest way to create / autogenerate a sensible schema based on the object?
I am looking for a function with this sort of signature:


obj = {'id':1,
  'name':  'Gregg',
  'events':  ['ts':  129292939392, 'what':  'keypress'},
                 {'ts':  129292939394, 'what':  'click'}]
}

def gen_schemas_and_create_table_statements(obj=obj, primary_key='id')

that would give something like:

* 2 (or maybe 3) tables:      table1:  pk id, string name  ;  table2 (events):   foreign id, ts, what

along with the sql to create them. 

(this is inspired by all the grossness I deal with in Hive).


Thanks!

GL

Michael Bayer

unread,
Dec 16, 2011, 7:14:58 PM12/16/11
to sqlal...@googlegroups.com

It's possible but a little involved since it has to traverse through the whole structure, build up a tree of all the attribute names it finds, then apply various geometries to known table structures. Step one would be to organize a structure like this:

container:root
name:id; type:int
name:name; type: string; length:5 (based on the longest length you see in the data)
name events; type:collection of event_obj

container:event_obj
name:ts; type:int;
name:what; type:string

then you go through that and generate two tables. the tables would probably get surrogate primary keys added in, and you can also make a rule "if the container already has an integer name 'id', that's the pk".

the structure is essentially a tree (every node has one parent) so I don't think you'd see any many-to-many relationships falling out of it.


Reply all
Reply to author
Forward
0 new messages