In principle I don't like the idea of tying the backends of
storeconfigs and inventory together by sharing tables, especially
since I'm not clear on the future of storeconfigs or a lot of details
of how it's currently used, so it makes it harder to change
implementation details. As a specific example, I don't like the
schema storeconfigs has for storing fact data (explained in more
detail below) and would prefer to use a different one. If we share
tables this is awkward.
I propose that we don't share tables, and the inventory service (and
any other future service that needs a database backend) has its own
set of namespaced tables (servicename_tablename). Ideally I would
like to use separate database schemas entirely, but that would be a
bigger, harder to manage change with the current code that relies on
the active_record terminus.
Currently the storeconfigs tables dealing with facts look something
like this (I've removed the columns that are irrelevant to the
inventory service):
create_table :hosts do |t|
t.column :name, :string, :null => false
end
create_table :fact_names do |t|
t.column :name, :string, :null => false
end
create_table :fact_values do |t|
t.column :value, :text, :null => false
t.column :fact_name_id, :integer, :null => false
t.column :host_id, :integer, :null => false
end
I propose something more like:
create_table :nodes do |t|
t.column :name, :string, :null => false
t.column :timestamp, :datetime
end
create_table :facts do |t|
t.column :name, :string, :null => false
t.column :value, :text, :null => false
t.column :node_id, :integer, :null => false
end
It's less normalized than the storeconfigs schema since fact names
will be duplicated per node, but easier to understand and work with,
and I think better satisfies the types of queries we will be doing
which are of the form "select nodes where fact equal to value". The
more normalized schema would be better for queries of the form "select
all values for fact", but I don't think that's something we'll be
doing. Correct me if I'm wrong.
Other benefits of the proposed schema include the "metadata" about
each fact set being columns on the node table (Nick has also proposed
that table be called fact_sets and have a column called node_name)
instead of being stored as a fact. Also we tend to use the word host
all over our code (in both puppet and dasbhoard) when we really ought
to use the word node since host confuses people into thinking the host
name is what identifies a node, when by default it's the fqdn and
could be anything.
> Please share any other comments or concerns you may have related to this
> proposal, particularly if it would interfere with your current use of
> storeconfigs. Thanks.
Questions:
Do or will we want historical fact sets? Current understanding is no,
that we only store the most recent fact set per node. This makes the
database smaller and I can't think of a motivator for wanting
historical fact sets, but maybe someone else can.
What other "metadata" do we want to store about facts. Currently the
only metadata we're storing is timestamp.
On Fri, Feb 25, 2011 at 1:55 PM, Matt Robinson <ma...@puppetlabs.com> wrote:
> I propose that we don't share tables, and the inventory service (and
> any other future service that needs a database backend) has its own
> set of namespaced tables (servicename_tablename).
Thanks to those who gave feedback. The general consensus I've reached
talking offline to other devs (Jacob, Nick, Paul) is that we should
use separate tables for the inventory service from the ones that
storeconfigs currently uses.
The question of whether to normalize or denormalize (which I didn't
mean to have be the focus of this discussion at all) can be left up to
the devs who end up working on the implementation, taking the
discussion from this thread into account.
Matt