proposed schema metadata

4 views
Skip to first unread message

Jeremy Carbaugh

unread,
May 27, 2009, 9:57:27 AM5/27/09
to DataCommons
Each schema in the system that contains imported data will have
certain meta data associated with each record. Since we are dealing
with a large amount of data here, it makes sense to include an
source_id on each record rather than a set of attributes that would be
mostly duplicate data. This will reduce the total amount of data that
is transferred per query or bulk download.

Import
----------------------------------------------------------
id unique import identifier
timestamp date and time of the imported data
source source of the data
source_file filename of the source data
source_size size of the source import file
importer person or process that imported the data

As of the now the plan is to NOT include version information. I plan
to archive all of our data imports and provide those to the public. If
someone REALLY wants to have a historical record they can download the
data commons code and the imports from a specific date/time.
Reply all
Reply to author
Forward
0 new messages