How would I then write a sql query that could get all items and their
properties, with each row containing an item and all its properties?
One possible solution is to just have one table, and use alter
statements to modify the table, but that seems really ugly... But
perhaps that is the right solution?
This approach is called EAV (Entity-Attribute-Value) design and we do
not use it SQL! Try using XML or a notebook until you have a real data
model. This approach does not work when you have to go into
production.
I found an old "cut & paste". Someone like you posted this:
CREATE TABLE EAV -- no key declared
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);
INSERT INTO EAV VALUES ('LOCATION','Bedroom');
INSERT INTO EAV VALUES ('LOCATION','Dining Room');
INSERT INTO EAV VALUES ('LOCATION','Bathroom');
INSERT INTO EAV VALUES ('LOCATION','courtyard');
INSERT INTO EAV VALUES ('EVENT','verbal aggression');
INSERT INTO EAV VALUES ('EVENT','peer');
INSERT INTO EAV VALUES ('EVENT','bad behavior');
INSERT INTO EAV VALUES ('EVENT','other');
CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1,1) NOT NULL, --proprietary, nonk-relational
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );
INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');
Ideally, the result set of the query would be Location Event count
(headings if possible)
Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1
Also, if possible, another query would return this result set. (I think
I know how to do this one.)
Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1
Here is an answer From: Thomas Coleman
SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue
FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;
Is the same thing in a proper schema as:
SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;
The reason that I had to use so many subqueries is that those entities
are all lopped into the same table. There should be separate tables for
Locations and Events.
The column names are seriously painful. Beyond the fact that I
personally hate underscores in column names, using underscores at the
end of the column name is really non-intuitive. I removed them for my
example and came across the next column name faux pas. Don't use "key"
and "value" for column names. It means that the developer *has*
surround the column name with square brackets for everything which is a
serious pain.
There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.
"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole
All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.
Try to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.
Try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!
Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.
For those who are interested, there are couple of links to articles I
found on the net:
Generic Design of Web-Based Clinical Databases
http://www.jmir.org/2003/4/e27/
The EAV/CR Model of Data Representation
http://ycmi.med.yale.edu/nadkarni/eav_CR_contents.htm
An Introduction to Entity-Attribute-Value Design for Generic
Clinical Study Data Management Systems
http://ycmi.med.yale.edu/nadkarni/Introduction%20to%20EAV%20systems.htm
Data Extraction and Ad Hoc Query of an Entity- Attribute- Value
Database
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubme...
Exploring Performance Issues for a Clinical Database Organized Using
an Entity-Attribute-Value Representation
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubme...
Before starting out with writing SQL queries right away, you might want to
make sure if your analysis is done right. In practice, no useful logical
model has "dynamic attributes" -- it is often the result of faulty or
incomplete understanding of the business model in the first place.
Some background regarding your business model might be helpful. What exactly
are these items that you try to represent in a table? Are you sure these
items are valid entity types in your business model? Can you give a
realistic example of a few items along with the so called properties of a
particular item?
--
Anith
First of all, you should tell your customer that what they ask for is
expensive. About what solution you pick, the development and maintenance
cost will be high.
> How would I then write a sql query that could get all items and their
> properties, with each row containing an item and all its properties?
Think of it: you have a SELECT that hits 200 items, and some items has eight
properties (not necessarily the same), some has four, some has thirteen etc.
Maybe you need to return 23 columns. But then a new item fits the criteria,
and now you have 25 columns.
In any case, such a query has to be constructed dynamically, because a
query always have a fixed set of return columns. Just to give you
the structure, here is a query for three properties:
SELECT I.itemid, I.itemname,
prop1 = MIN (CASE p.Propcode WHEN 'PROP1' THEN p.Value END),
prop2 = MIN (CASE p.Propcode WHEN 'PROP2' THEN p.Value END),
prop3 = MIN (CASE p.Propcode WHEN 'PROP3' THEN p.Value END)
FROM Items I
LEFT JOIN Properties p ON I.itemid = p.itemid
GROUP BY I.itemid, I.itemname
This is a pivoting query. Without the MIN there would be one row per
property, but with only one non-NULL value on each row. By using MIN
we get all on row. Note that you could use MAX as well.
> One possible solution is to just have one table, and use alter
> statements to modify the table, but that seems really ugly... But
> perhaps that is the right solution?
This can have other problems, as the code does need to know about these
columns. But it's easier to add constraints like "permitted values are
A, B and C". You don't have to work with sql_variant. And most of all,
queries are a lot easier to write against such a table. Say that in
two years ask "Give me all items that have size XL, have gluttony
factor > 0.5 and that have no warehouse defined". Wrting such a query
against a single table is dirt simple. Writing it against a table
with property values is not very fun.
Whether this is a workable solution depends on what sort of admin
we're talking about. Is that a mere a superuser, or someone who
has sysadmin privileges anyway.
Yet a possibilty, which neither that is appealing, is that you
simply add a couple of userdefined1, userdefined2 etc columns.
Whatever strategy, it may pay off for both you and the client if you
make a more thorough analysis of what properties that actually are of
interest. Maybe you can nail down them, and not need any dynamics at
all?
--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se
Books Online for SQL Server 2005 at
http://www.microsoft.com/technet/prodtechnol/sql/2005/downloads/books.mspx
Books Online for SQL Server 2000 at
http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx
The specific case for my post is that a customer wants to be able to
list their inventory on a web page. Each item in their inventory has
the same set of properties, but the admin may at sometime decide to add
more properties (for example manufacturer, name, size, color etc etc)
That is why the columns need to be dynamic.
Seems like a problem that is quite common, but so far, all the
solutions i can think of or that have been suggested are quite
complicated. I guess there just is no easy solution?
Thanks for all the replies...
Anyway, if the number of properties are around 25 and apply to all items,
I think you and the customer should really make an effort to nail down
which these properties are, and then go for static tables.
Of course, new requirements will always appear in the future, in which case
the application needs to be augmented.
There are situations where the design you suggested is called for, but
my gut feeling is that this is not the case this time.
But of course, I only know as much about you case, as you have told us
here, so use your own common sense.