Querying an EAV Table

Don Miller

unread,

Nov 10, 2006, 11:28:32 AM11/10/06

to

I'm trying to come up with a query using an entity-attribute-value table.
Sample data is listed below:

Person Table

PersonID PersonName
-------- ----------
1 Mary
2 Joe
3 Paul
4 John

Fact Table

PersonID Attribute Value
-------- ---------- -----------
1 City Topeka
1 Size Small
1 Education College
2 Size Large
2 Education College
3 City Topeka
3 Size Small
3 Education High School
4 City Topeka
4 Education College

As an example, I would like to identity only those persons (by PersonName)
whose City is "Topeka" AND whose Education is "College". So, the query
should return only the names Mary and John. And I'd also like to add a third
(or fourth or fifth) parameter, those whose Size is "Small" (in addition to
Topeka and College), so that only Mary would be returned. So what I need is
an intersection of sets for n sets.

I'm baffled. Thanks for any direction or help.

Razvan Socol

unread,

Nov 10, 2006, 11:42:33 AM11/10/06

to

Hello, Don

You can use something like this:

SELECT PersonName FROM PersonTable
WHERE PersonID IN (
SELECT PersonID FROM FactTable
WHERE Attribute='City' AND Value='Topeka'
) AND PersonID IN (
SELECT PersonID FROM FactTable
WHERE Attribute='Education' AND Value='College'
) /* AND PersonID IN (
SELECT PersonID FROM FactTable
WHERE Attribute='Size' AND Value='Small'
) */

Of course, performance will be much worse than a normal table with
proper indexes (not to speak about maintainability of such queries), so
you should consider if dropping the EAV model is appropriate.

Razvan

Sericinus hunter

unread,

Nov 10, 2006, 11:54:21 AM11/10/06

to

How about this?

SELECT
PersonId
FROM
Facts
WHERE
(Attribute = 'City' AND Value = 'Topeka')
OR
(Attribute = 'Education' AND Value = 'College')
GROUP BY PersonId HAVING COUNT(PersonId) = 2

Then join with Persons to retrieve names.

Don Miller

unread,

Nov 10, 2006, 12:24:05 PM11/10/06

to

Yes, that works. Thanks.

This database is designed to be offline for use in research so performance
is not that much of an issue and the need for an EAV approach is necessary
because there would otherwise be a never-ending need for adding more columns
on one table.

The next step (and actually the desired step) is to create query output like
this:

PersonID Name FactAttribute FactValue
--------- ------ -------------- -----------
1 Mary City Topeka
1 Mary Education College
2 John City Topeka
2 John Education College

Any ideas on how the query below would be modified?

"Razvan Socol" <rso...@gmail.com> wrote in message
news:1163176953.4...@h54g2000cwb.googlegroups.com...

Anith Sen

unread,

Nov 10, 2006, 1:44:42 PM11/10/06

to

>> I'm trying to come up with a query using an entity-attribute-value table.

Why would you want to devise a scheme like that? For an idea of the issues
involed see:
http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html

If a redesign is something beyond your control, consider a view to reduce
the damage. You can create one like:

CREATE VIEW vw ( person, city, size, education )
AS
SELECT person_id,
MAX( CASE attr WHEN 'city' THEN Value END ) AS City,
MAX( CASE attr WHEN 'Size' THEN Value END ) AS City,
MAX( CASE attr WHEN 'Education' THEN Value END ) AS City
FROM Facts
GROUP BY person_id ;

You can have your query against this view as:

SELECT *
FROM vw
INNER JOIN persons p1
ON p1.person_id = vw.personid
WHERE vw.City = @city
AND vw.Education = @edu ;

--
Anith

Don Miller

unread,

Nov 10, 2006, 2:40:54 PM11/10/06

to

Thank you for your suggestion. I didn't think of using views at all and I
LIKE it.

Actually, I am in total control of the data model and I am developing it
in-concert with custom query tools. I am familiar with the downsides of EAV
but I am not fortunate enough to be working in a domain where I can
anticipate all columns ever needed in a table, or am willing to change the
data model when heterogenous data needs a place.

Here is an abstract discussing the kind of things I am dealing with:

http://ycmi.med.yale.edu/nadkarni/Abstracts.htm#AdHocQuery

"Anith Sen" <an...@bizdatasolutions.com> wrote in message
news:e%23sOKhPB...@TK2MSFTNGP03.phx.gbl...

Anith Sen

unread,

Nov 10, 2006, 2:56:26 PM11/10/06

to

Don,

>> Here is an abstract discussing the kind of things I am dealing with:
>> http://ycmi.med.yale.edu/nadkarni/Abstracts.htm#AdHocQuery

I am familiar with Nadkarni's work and his online influence on others trying
to attempt such design. Since it is coming out Yale from someone with a PhD,
many folks assume it is worth something. He ignores several critical
integrity issues, avoid the potential complexity is query expressiveness and
certain mundane yet fundamental issues. In a nutshell, his sugestions and
proposals are nonsense.

Here is a draft version of an article on common lookup tables that details
the problems. http://www.projectdmx.com/dbdesign/lookup.aspx ( Note that it
is not addressing the EAV issue, but the drawbacks and the purported
advantages are the same for both EAV & common lookup tables )

--
Anith

--CELKO--

unread,

Nov 11, 2006, 9:13:05 AM11/11/06

to

>> This database is designed to be offline for use in research so performance is not that much of an issue .. <<

And apparently neither is data integrity. Just for fun, try to write a
basic CHECK() on your magical, shapeless fact_value column; you get a
CASE expression with hundreds of very complex WHEN clauses -- they have
to handle converting NVARCHAR() to all other data types . So you do
not bother with it and pretty soon have a color attribute with the
value 3.14159.

I worked for a company that used this approach. EAV destroys it in
about a year of production work. Orphaned rows are also a huge problem
as they fill up the disk space.

>> and the need for an EAV approach is necessary because there would otherwise be a never-ending need for adding more columns on one table. <<

No, it is not. RDBMS assumes that you know what you are doing --
research time is over and you can now build a complete and consistent
model of your entrerprise.

Pick another tool for keeping your notes. Free text and a string
ssearch tool is good. LISP is good. SQL is not.

I found an old "cut & paste". Someone like you posted this:

CREATE TABLE EAV -- no key declared, all null-able
(key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL);

INSERT INTO EAV VALUES ('LOCATION', 'Bedroom');
INSERT INTO EAV VALUES ('LOCATION', 'Dining Room');
INSERT INTO EAV VALUES ('LOCATION', 'Bathroom');
INSERT INTO EAV VALUES ('LOCATION', 'courtyard');
INSERT INTO EAV VALUES ('EVENT', 'verbal aggression');
INSERT INTO EAV VALUES ('EVENT', 'peer');
INSERT INTO EAV VALUES ('EVENT', 'bad behavior');
INSERT INTO EAV VALUES ('EVENT', 'other');

CREATE TABLE EAV_DATA -note lack of constraints, defaults, DRI
(id INTEGER IDENTITY (1, 1) NOT NULL, -- magical physical locator
bts_id INTEGER NULL,
key_col VARCHAR (10) NULL,
attrib_value VARCHAR (50) NULL );

INSERT INTO EAV_DATA VALUES (1, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (1, 'EVENT', 'bad behavior');
INSERT INTO EAV_DATA VALUES (2, 'LOCATION', 'Bedroom');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (2, 'EVENT', 'verbal aggression');
INSERT INTO EAV_DATA VALUES (3, 'LOCATION', 'courtyard');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'other');
INSERT INTO EAV_DATA VALUES (3, 'EVENT', 'peer');

Ideally, the result set of the query would be Location Event count
(headings if possible)

Bedroom verbal aggression 1
Bedroom peer 0
Bedroom bad behavior 0
Bedroom other 2
Dining Room verbal aggression 0
Dining Room peer 0
Dining Room bad behavior 0
Dining Room other 0
Bathroom verbal aggression 0
Bathroom peer 0
Bathroom bad behavior 0
Bathroom other 0
courtyard verbal aggression 0
courtyard peer 1
courtyard bad behavior 0
courtyard other 1

Also, if possible, another query would return this result set. (I think
I know how to do this one.)

Location Event count
Bedroom verbal aggression 1
Bedroom other 2
courtyard peer 1
courtyard other 1

Here is an answer From: Thomas Coleman

SELECT Locations.locationvalue, Events.eventvalue,
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event'
) AS EventData
ON LocationData.bts_id = EventData.bts_id
) AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events
ORDER BY Locations.locationvalue, Events.eventvalue ,
SELECT Locations.locationvalue, Events.eventvalue
(SELECT COUNT(*)
FROM (SELECT LocationData.locationvalue, EventData.eventvalue

FROM (SELECT TD1.bts_id, TD1.value AS locationvalue
FROM eav_data AS TD1
WHERE TD1.key = 'location') AS LocationData
INNER JOIN
(SELECT TD2.bts_id, TD2.value AS eventvalue
FROM eav_data AS TD2
WHERE TD2.key = 'event') AS EventData
ON LocationData.bts_id = EventData.bts_id)
AS CollatedEventData
WHERE CollatedEventData.locationvalue = Locations.locationvalue
AND CollatedEventData.eventvalue = Events.eventvalue)
FROM (SELECT T1.value AS locationvalue
FROM EAV AS T1
WHERE T1.key = 'location') AS Locations,
(SELECT T2.value AS eventvalue
FROM EAV AS T2
WHERE T2.key = 'event') AS Events;

Is the same thing in a proper schema as:

SELECT L.locationvalue, E.eventvalue, COUNT(*)
FROM Locations AS L, Events AS E
WHERE L.btd_id = E.btd_id
GROUP BY L.locationvalue, E.eventvalue;

The reason that I had to use so many subqueries is that those entities
are all plopped into the same table. There should be separate tables
for Locations and Events.

The column names are seriously painful. Don't use reserved words like
"key" and "value" for column names. It means that the developer *has*
surround the column name with double quotes for everything. And they
are too vague to be data element names anyway!

There is such a thing as "too" generic. There has to be some structure
or everything becomes nothing more than a couple of tables called
"things". The real key (no pun intended) is commonality. Is there a
pattern to the data that they want to store? It may not be possible to
create one structure to rule them all and in the darkness bind them.

"To be is to be something in particular; to be nothing in particular is
to be nothing." --Aristole

All data integrity is destroyed. Any typo becomes a new attribute or
entity. Entities are found missing attributes, so all the reports are
wrong.

Try to write a single CHECK() constraint that works for all the
attributes of those 30+ entities your users created because you were
too dumb or too lazy to do your job. It can be done! You need a case
expression almost 70 WHEN clauses for a simple invoice and order system
when I tried it as an exercise.

Try to write a single DEFAULT clause for 30+ entities crammed into one
column. Impossible!

Try to set up DRI actions among the entities. If you thought the WHEN
clauses in the single CASE expression were unmaintainable, wait until
you see the "TRIGGERs from Hell" -- Too bad that they might not fit
into older SQL Server which had some size limits. Now maintain it.

For those who are interested, there are couple of links to articles I
found on the net:

Generic Design of Web-Based Clinical Databases
http://www.jmir.org/2003/4/e27/

The EAV/CR Model of Data Representation
http://ycmi.med.yale.edu/nadkarni/eav_CR_contents.htm

An Introduction to Entity-Attribute-Value Design for Generic
Clinical Study Data Management Systems
http://ycmi.med.yale.edu/nadkarni/Introduction%20to%20EAV%20systems.htm

Data Extraction and Ad Hoc Query of an Entity- Attribute- Value
Database
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubme...

Exploring Performance Issues for a Clinical Database Organized Using
an Entity-Attribute-Value Representation
http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubme...

Sericinus hunter

unread,

Nov 11, 2006, 11:04:32 AM11/11/06

to

Joe,

With all my respect, you did not find the articles listed
below, I did. You just copy/pasted my text from here:

http://groups.google.com/group/microsoft.public.sqlserver.programming/browse_frm/thread/f4999435b40dfd8f/1ad9282ec51bfd30?lnk=st&q=&rnum=1&hl=en#1ad9282ec51bfd30

including "I found..." part. And this is not the first time you do it.
I don't mind you mention these articles in your canned replies, but
please at least change "I found" to "somebody found".

--CELKO-- wrote:
...

--CELKO--

unread,

Nov 11, 2006, 5:55:08 PM11/11/06

to

>> With all my respect, you did not find the articles listed below, I did. You just copy/pasted my text from here: <<

You are right; I apolgize. I will clean up my cut & paste and add some
more Blog references to EAV disasters.

Don Miller

unread,

Nov 13, 2006, 9:58:05 AM11/13/06

to

"--CELKO--" wrote in message

>> This database is designed to be offline for use in research
>> so performance is not that much of an issue ..

> And apparently neither is data integrity...

> pretty soon have a color attribute with the
> value 3.14159

In this case, the input data is structured and controlled (no free text
entry of attributes or values). BUT... this structured data can vary from
organization to organization and this database MUST accommodate existing and
NEW organizations each with their new and disparate attributes and values.
Row-modeled EAV is used in EVERY database that must accommodate new brands
and cans of peas in Product tables.

>> and the need for an EAV approach is necessary because there
>> would otherwise be a never-ending need for adding more
>> columns on one table.

> RDBMS assumes that you know what you are doing

I'm not sure what this means. Is this just insulting, or is your experience
only limited to situations, no matter how large and complex, where all the
possible data is known and can never change in a wholesale fashion (dozens
or hundreds of new distinct datapoints)?

> research time is over and you can now build a complete
> and consistent model of your entrerprise.

Yes, it would be great to work with a defined, finite, and never-changing
dataset that could be modeled once and forgotten. But there are always new
brands of peas and new research questions and data collection needs that
defy prediction or imagination. Adding columns to tables on a periodic basis
is not an option.

No one has ever said that EAV is perfect (or easy) but it can accommodate
whatever data is thrown at it, and it is left to metadata and middle-tier
logic to present, query, and make sense out of that data.

Razvan Socol

unread,

Nov 14, 2006, 2:09:31 AM11/14/06

to

Hello, Don

You can use something like this:

SELECT p.PersonID, p.PersonName, f.Attribute, f.Value
FROM PersonTable p INNER JOIN FactTable f ON p.PersonID=f.PersonID
WHERE p.PersonID IN (
SELECT f1.PersonID FROM FactTable f1
WHERE f1.Attribute='City' AND f1.Value='Topeka'
) AND p.PersonID IN (
SELECT f2.PersonID FROM FactTable f2
WHERE f2.Attribute='Education' AND f2.Value='College'
) AND (f.Attribute='City' OR f.Attribute='Education')

or like this:

SELECT p.PersonID, p.PersonName, f.Attribute, f.Value
FROM PersonTable p INNER JOIN FactTable f ON p.PersonID=f.PersonID
WHERE p.PersonID IN (

SELECT PersonID FROM FactTable
WHERE Attribute='City' AND Value='Topeka'

OR Attribute='Education' AND Value='College'
GROUP BY PersonID HAVING COUNT(*)=2
)

If you really must use EAV, at least make sure that you have a unique
key in the FactTable on (PersonID, Attribute), to ensure that you don't
have two different values for the same attribute of the same person.

Razvan

Sericinus hunter

unread,

Nov 14, 2006, 9:58:40 AM11/14/06

to

Sure. Also, would be nice to see a couple of references to right
in your mind solutions for this kind of problems.

--CELKO--

unread,

Nov 14, 2006, 6:50:21 PM11/14/06

to

>> Row-modeled EAV is used in EVERY database that must accommodate new brands and cans of peas in Product tables. <<

Bad example; You would use the industry stanard UPC/EAN/GTIN codes on
the cans as the natural key for the Inventory table. Each manufacturer
will have his own UPC number, within which we would find various peas
that differ by type of peas and the size of the can. The grocery
business does not use EAV and never will because it is a nightmare.

EAV is danger, slow and usually assign of a lack of research. Anohter
example was a posting a few years back for a record collection in an
EAV. The poster had not heard of the Schwann catalog or RIAA and was
inventing his own system on the fly.

Don Miller

unread,

Nov 14, 2006, 11:39:21 PM11/14/06

to

Maybe a bad example, but you have not addressed how systems that are not so
clearcut or standardized or finite (e.g. clinical medicine or biomedical
research) can be modeled using a RDMS without EAV. And again, no one has
suggested that EAV is a perfect solution, since it has well-known potential
integrity issues, it is cumbersome, and it is slow.

Maybe you could write a white paper or a book on how to address collecting
unpredictable new data classes and ever-changing datasets without using a
technique like EAV and end this sort of discussion once and for all. Try
working with this rather "simple" data collection tool
http://sales.acog.com/acb-custom/aa128.pdf for starters. If this never
changed, anyone could map this to a database. But just imagine that about
half of the questions change every year, a half-dozen new tests are added
every year, but you still have to keep the old questions and their answers
and tests available for the next 18 years. AND, you can't change the
underlying data model every year.

Or, you could continue to provide non-constructive and non-productive
commentary.

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:1163548221.7...@f16g2000cwb.googlegroups.com...

Damien

unread,

Nov 15, 2006, 5:46:57 AM11/15/06

to

Don Miller wrote:

> Maybe a bad example, but you have not addressed how systems that are not so
> clearcut or standardized or finite (e.g. clinical medicine or biomedical
> research) can be modeled using a RDMS without EAV. And again, no one has
> suggested that EAV is a perfect solution, since it has well-known potential
> integrity issues, it is cumbersome, and it is slow.
>
> Maybe you could write a white paper or a book on how to address collecting
> unpredictable new data classes and ever-changing datasets without using a
> technique like EAV and end this sort of discussion once and for all. Try
> working with this rather "simple" data collection tool
> http://sales.acog.com/acb-custom/aa128.pdf for starters. If this never
> changed, anyone could map this to a database. But just imagine that about
> half of the questions change every year, a half-dozen new tests are added
> every year, but you still have to keep the old questions and their answers
> and tests available for the next 18 years. AND, you can't change the
> underlying data model every year.
>
> Or, you could continue to provide non-constructive and non-productive
> commentary.
>

Indeed. One of our systems uses something along these lines. In our
case, we have a series of questions which are asked of users, and those
answers are then evaluated later by another system. We were tasked to
allow the questions to be varied in the future (but interestingly, if
an old client comes back, they keep the set of questions which were
valid when they first used the system).

In our model, we have a DataItems table which stores a name for a
DataItem, and a TypeRequired column. TypeRequired is propogated through
various tables (e.g. PageFields, ClientAnswers), and is used to
control, for instance, which controls can be associated with a data
item (so that data input is correct), and which columns of the
ClientAnswers table can be filled.

E.g. we have a data item called "NumChildren", with a TypeRequired of
"Int". Through proper use of Foreign Key and Check Constraints, we
ensure that, if there is a Client Answer for this Data Item, then 1)
There is only one (PK on Client Answers is on ClientID and DataItemID),
2) That the TypeRequired in the Client Answers table is set to "Int"
(FK of DataItemID and TypeRequired referencing DataItems table), and 3)
That the data stored is of the correct type (Check constraint has a
series of checks of the type (IntValue is null or TypeRequired='Int'),
for each Value column)

Using this technique, I am quite sure that we are not going to get
magically wrong values entered into our DB, even though the
ClientAnswers table is essentially an EAV table.

Damien

Don Miller

unread,

Nov 15, 2006, 9:09:20 AM11/15/06

to

Thank you for the real-world example and description.

Creating extensive metadata and middle-layer logic is the key, and
fortunately, once this hard work is done, the resulting EAV-modeled database
can handle anything that is thrown at it without changing the data model
(e.g. adding new tables, new columns, new relationships, etc.).

This message has been repeated over and over again in the literature by
Prakash Nadkarni MD who has worked with biomedical data with issues that are
almost beyond comprehension (by me) and by those who have not ever been
challenged with such issues who say EAV-modeling is "junk", "nonsense", and
"dangerous".

"Damien" <Damien_The...@hotmail.com> wrote in message
news:1163587617.8...@h48g2000cwc.googlegroups.com...

Damien

unread,

Nov 15, 2006, 10:31:05 AM11/15/06

to

Don Miller wrote:

> Thank you for the real-world example and description.
>
> Creating extensive metadata and middle-layer logic is the key, and
> fortunately, once this hard work is done, the resulting EAV-modeled database
> can handle anything that is thrown at it without changing the data model
> (e.g. adding new tables, new columns, new relationships, etc.).
>
> This message has been repeated over and over again in the literature by
> Prakash Nadkarni MD who has worked with biomedical data with issues that are
> almost beyond comprehension (by me) and by those who have not ever been
> challenged with such issues who say EAV-modeling is "junk", "nonsense", and
> "dangerous".
>

I've done some googling on that name and read up a bit. Some
interesting stuff in there.

Have you managed to get the solution to your problems yet (I think
Razvan Socol has answered everything I've seen from you)? I only really
barged in to this thread to provide an example of an EAV that isn't
going to get filled with junk within a year.

Damien

Steve Dassin

unread,

Nov 15, 2006, 5:46:56 PM11/15/06

to

"Don Miller" <nos...@nospam.com> wrote in message
news:uOwgEVOB...@TK2MSFTNGP03.phx.gbl...
>.

> So what I need is an intersection of sets for n sets.

Precisely -:)

I've been working with the Dataphor system which is an application
development system with a 'true' relational language. I use
MS Sql Server as the data storage system. I don't pretend to know
anything about EAV and I won't comment on pros nor cons. But for
anyone interested here is how a relational system can simplify
the type of query in question.

create table Person
{
PersonID:Integer,
PersonName:String,
key{PersonID}
}

//This data is stored in MS Sql Server.
insert table
{
row{1 PersonID,'Mary' PersonName},
row{2,'Joe'},
row{3,'Paul'},
row{4,'John'}
}
into Person

create table FactTable
{
PersonID:Integer,
Attribute:String,
Avalue: String,
key{PersonID,Attribute},
reference FactTable_PersonID {PersonID}
references Person{PersonID}
}

Operator (procedure) S1 returns a table from an attribute/value string.

create operator S1(LStr:String):table{Attribute:String,Avalue:String}
begin
result:=
(
ToTable( (LStr).Split({','}),'Str' )
add {SubString(Str,0,Pos('/', Str)).Trim() Attribute,
SubString(Str,Pos('/', Str)+1).Trim() Avalue}
)
over{Attribute,Avalue} ;
end;

Example:
select S1(' City / Topeka , Education / College ');
/*
Attribute Avalue
--------- --------
City Topeka
Education College
*/

Operator S2 takes an attribute/value string consistent with operator
S1 and returns a table with all persons whose attribute(s)/value(s)
are a proper subset of FactTable. Each unique PersonID is relationally
added to the derived input table of attribute/value (S1) and this table
is relationally tested (<) to determine if it is a proper subset
of FactTable. S1+PersonID is a proper subset of FactTable if S1 and
FactTable are both of the same table type and all the rows in S1 are
in FactTable and there is at least one row in FactTable that is not in S1.

create operator S2(StrAVs:String):
table{
PersonID:Integer,PersonName:String,
FactAttribute:String,FactValue:String
}
begin
result:=
(
Person
where
( S1(StrAVs) add {PersonID PersonID} )
< //Test if S1+PersonID is proper subset of FactTable.
FactTable
with {IgnoreUnsupported = 'true'}
)
times //Like a cross join in sql. Cross join each row in PersonID table
//where S1 is proper subset of FactTable with S1 to form
result.
//A form of ' an intersection of sets for n sets.'
S1(StrAVs)
with {IgnoreUnsupported = 'true'}
{PersonID,PersonName,Attribute FactAttribute,Avalue FactValue} ;
end;

Example:

select S2(' City / Topeka , Education / College ')
/*
PersonID PersonName FactAttribute FactValue

-------- ---------- ------------- ---------
1 Mary City Topeka
1 Mary Education College

4 John City Topeka
4 John Education College
*/

More info on Dataphor:
www.alphora.com

My blog including Dataphor:
http://racster.blogspot.com/

best,
steve

Steve Dassin

unread,

Nov 16, 2006, 2:22:46 AM11/16/06

to

Consider the link that Anith posted about the drawbacks of
querying an EAV table:
http://tonyandrews.blogspot.com/2004/10/otlt-and-eav-two-big-design-mistakes.html

Given this sample data:

create table EMP
{ empno:Integer } ;

create table EMPVALUES
{
empno:Integer,
code :String,
value:String,
key{empno,code},
reference EMPVALUES_empno {empno}
references EMP{empno}
};

insert table
{
row{1234 empno},
row{1254},
row{1274},
row{1284}
}
into EMP;

insert table
{
row{1234 empno,'NAME' code,'ANDREWS' value},
row{1234,'SAL','1000'},
row{1234,'JOB','CLERK'},
row{1254,'JOB','CLERK'},
row{1254,'NAME','DUNN'},
row{1254,'SAL','2500'},
row{1274,'JOB','CASHIER'},
row{1274,'NAME','LYNN'},
row{1274,'SAL','1500'},
row{1284,'JOB','CLERK'},
row{1284,'NAME','JAMES'},
row{1284,'SAL','1850'}
}
into EMPVALUES;

The author poses this question:
'consider a simple query: show the names of all employees
who are clerks and earn less than 2000.

With the EAV tables:

select ev1.name

from emp_values ev1, emp_values ev2 emp_values ev3

where ev1.code = 'NAME'

and ev1.empno = ev2.empno

and ev2.code = 'JOB'

and ev2.value = 'CLERK'

and ev1.empno = ev3.empno

and ev3.code = 'SAL'

and TO_NUMBER(ev3.value) < 2000;'

Without going into detail why sql is not relational, note the necessity
of the cartesian product (3 copies of emp_values), 1 copy for
each code. Sql is all about indexes and access paths. But there
are no free lunches and the price to pay for 'performance' is
the lack of any logically meaning to the concept of a key.
A key allows the logical addressing of data which is independent
of the concept of an index. This is why the cartesian product
(cross join) is necessary in sql. A relational system allows
direct access to the data via key(s) obviating any joins.
This seems well suited to the EAV model:) Anyways, here is
a very simple illustration of how the query could be written
in Dataphor using keys for logically addressing data. The
where clause is based on constructing a row from EMPVALUES
and comparing it to a row that encapsulates a person
satsifying the question. Just as tables can be compared,
so can rows.

select
(EMPVALUES where code='NAME' remove{code})
having
(
EMP
where
(
row
{
EMPVALUES[empno,'JOB'].value JOB,
EMPVALUES[empno,'SAL'].value SAL
}
redefine{SAL:=ToInteger(SAL)<2000}

with {IgnoreUnsupported = 'true'}
)

=
row{'CLERK' JOB,true SAL}
)

empno value
----- -------
1234 ANDREWS
1284 JAMES

Of course I didn't use any meta-data, operators or other goodies:)

The author states:
'And this is about the most simple of queries!'

If anyone has a 'complex' EAV query lets party:)

best,
steve
http://racster.blogspot.com/
The question in the future will be Data..For or against:)