EAV - again

Eric

unread,

Feb 5, 2015, 4:48:24 PM2/5/15

to

The recent mention of EAV in another thread reminded me of a presentation
I went to a year or so ago from a major vertical-market software supplier,
where one of the speakers actually boasted about their use of EAV!

They do other strange things as well, one of which is to (in effect)
disable the optimizer of the underlying DBMS. There might just be a
connection there :-)

Eric
--
ms fnd in a lbry

Eric

unread,

Feb 5, 2015, 4:50:44 PM2/5/15

to

We are used to hearing about EAV as a superficially clever idea which is
actually very bad. Not arguing with that at all. How might one look at
EAV?

1 as a way of providing end-user tailoring

2 as a way of adding entities and attributes without needing a database
Change Control

3 as a way of recording relationships between unstructured pieces of
data - http://geek-and-poke.com/?offset=1375995832310 (the data model
one) may be about this really

So,

* what is the right way to do 1?

* how do you argue against 2 (and win)?

* is 3 a legitimate need?

Erwin

unread,

Feb 6, 2015, 5:00:06 AM2/6/15

to

Op donderdag 5 februari 2015 22:50:44 UTC+1 schreef Eric:

What is your perceived difference between (1) and (2) ?.

Eric

unread,

Feb 6, 2015, 4:10:04 PM2/6/15

to

1 is software designed around EAV as a way of providing end-user
tailoring.

2 is developers trying to dodge the red tape they are supposed to work
with

Erwin

unread,

Feb 7, 2015, 11:05:36 AM2/7/15

to

Op vrijdag 6 februari 2015 22:10:04 UTC+1 schreef Eric:

That's cryptic.

EAV is typically and mostly used as a way to shove off the database design work onto the end user. I see that description fitting both your (1) and (2). Fitting with (1) because it provides a means for the end user to tailor the information model of this application. Fitting with (2) because, well, [that kind of] tailoring the information model is nothing else than "adding extra entities without database change control".

James K. Lowden

unread,

Feb 7, 2015, 2:44:38 PM2/7/15

to

On Fri, 6 Feb 2015 20:59:53 +0000
Eric <er...@deptj.eu> wrote:

> 1 is software designed around EAV as a way of providing end-user
> tailoring.

Never mind "tailoring", sometimes that's the whole design.

If you trundle around the world of "master data management" for a
little while, you'll soon find million-dollar systems with a single EAV
at the center and a giant application on top that lets the "user" --
what used to be called "application developer" -- define a database
cum application. It supports domains, constraints, and transformation
rules, and the selling point is that it's all ad hoc: you can do
anything you want, because the system neither enforces consistency of
any kind nor provides any means of checking it (other than user-written
reports).

Hurrah! No foreign key violations! Free at last!

And when management asks if it runs on Oracle, the answer of course is
Yes.

--jkl

James K. Lowden

unread,

Feb 7, 2015, 2:44:41 PM2/7/15

to

On Thu, 5 Feb 2015 21:50:34 +0000
Eric <er...@deptj.eu> wrote:

> We are used to hearing about EAV as a superficially clever idea which
> is actually very bad. Not arguing with that at all. How might one
> look at EAV?
>
> 1 as a way of providing end-user tailoring
>
> 2 as a way of adding entities and attributes without needing a
> database Change Control

Those two are the same, or nearly. They're both ways to add to the
schema without making the addition explicit.

> 3 as a way of recording relationships between unstructured pieces of
> data

Boy, do I dislike that term. Data are signal amid noise. Structure is
its distinguishing feature. Unstructured data don't exist.

Acknowledged, "unstructured" is the term people use for "data not
organized as tables". It's evidence of the influence of the
relational model on our thinking and assumptions. Even Stonebraker
refers to "semi-structured" data, knowing full well there's no halfway
point between structured and not. I imagine he uses it because it
conveys an idea.

I think what he means is really unanalyzed or poorly organized data. I
think that's a better term because it puts the emphaisis where it
belongs: not on the "nature" of the data, but on the undone work of
preparing them for analysis.

> * what is the right way to do 1?
> * how do you argue against 2 (and win)?

You cannot reason a man out of a position he did not reason himself
into.

Every organization I know of has policies that thwart the use of
database management systems. Administrative access is so tightly
controlled that often the DBAs themselves are required to acquire
temporary authority to do their jobs, and justify why. No one is
in charge of the logical design. Changes are never measured against
correctness or coherency, but by whether or not "anything breaks", a
question the organization devotes precious little resources to make
possible to answer a priori. Stupidity predictably breeds conservatism,
and the hairball only grows.

The organizational structure, by actively preventing use of the
management features in the DBMS, layers cost upon cost: controlling the
administrators instead of requiring them to control (and exploit) the
system. In such an Alice-in-Wonderland world, what does "win" mean?

The right way to do it is to make the features of the DBMS accessible
to the application and the user in controlled ways. Most DBMSs today
support namespaces of some kind, e.g. "roles", that permit database
objects that are seen only by a particular group. Let users and
applications define tables according to their "needs" as they see
them. Surely there's no more harm in that than in providing only an
EAV and foreclosing any opportunity for database management.

--jkl

Eric

unread,

Feb 8, 2015, 7:50:14 AM2/8/15

to

Yes, that's one end of the spectrum, and yes, it exists. Thankyou.

Eric

unread,

Feb 8, 2015, 7:50:15 AM2/8/15

to

Fits 2 up to a point, but it is more "make my life easier" or "hey user,
I think I can short-circuit this to get it in production sooner".

Eric

unread,

Feb 8, 2015, 7:50:18 AM2/8/15

to

On 2015-02-07, James K. Lowden <jklo...@speakeasy.net> wrote:

> On Thu, 5 Feb 2015 21:50:34 +0000
> Eric <er...@deptj.eu> wrote:
>
>> We are used to hearing about EAV as a superficially clever idea which
>> is actually very bad. Not arguing with that at all. How might one
>> look at EAV?
>>
>> 1 as a way of providing end-user tailoring
>>
>> 2 as a way of adding entities and attributes without needing a
>> database Change Control
>
> Those two are the same, or nearly. They're both ways to add to the
> schema without making the addition explicit.

In 1 the developer(s) provide something for others to use, in 2 they
are providing something for themselves to use, but yes, there is a lot
of overlap.

>> 3 as a way of recording relationships between unstructured pieces of
>> data
>
> Boy, do I dislike that term. Data are signal amid noise. Structure is
> its distinguishing feature. Unstructured data don't exist.
>
> Acknowledged, "unstructured" is the term people use for "data not
> organized as tables". It's evidence of the influence of the
> relational model on our thinking and assumptions. Even Stonebraker
> refers to "semi-structured" data, knowing full well there's no halfway
> point between structured and not. I imagine he uses it because it
> conveys an idea.
>
> I think what he means is really unanalyzed or poorly organized data. I
> think that's a better term because it puts the emphaisis where it
> belongs: not on the "nature" of the data, but on the undone work of
> preparing them for analysis.

Perhaps the term should be "non-data", and I am thinking of a picture,
or a poem, or a newspaper article which, these days, are stored in a
computer system. Of course there will be meta-data for them which, from
out point of view, is data and should be properly structured. But how do
we say, in our system, that a picture is linked to a poem because it is
claimed to be the inspiration for the poem. This pretty much has to be
a generic link because you can't predict what link descriptions people
will want, so a schema handling this is going to look like EAV, isn't it?

>> * what is the right way to do 1?
>> * how do you argue against 2 (and win)?
>
> You cannot reason a man out of a position he did not reason himself
> into.
>
> Every organization I know of has policies that thwart the use of
> database management systems. Administrative access is so tightly
> controlled that often the DBAs themselves are required to acquire
> temporary authority to do their jobs, and justify why. No one is
> in charge of the logical design. Changes are never measured against
> correctness or coherency, but by whether or not "anything breaks", a
> question the organization devotes precious little resources to make
> possible to answer a priori. Stupidity predictably breeds conservatism,
> and the hairball only grows.
>
> The organizational structure, by actively preventing use of the
> management features in the DBMS, layers cost upon cost: controlling the
> administrators instead of requiring them to control (and exploit) the
> system. In such an Alice-in-Wonderland world, what does "win" mean?

There are organisations like that, and you can't win (in any sense) at
the level we are at here.

> The right way to do it is to make the features of the DBMS accessible
> to the application and the user in controlled ways. Most DBMSs today
> support namespaces of some kind, e.g. "roles", that permit database
> objects that are seen only by a particular group. Let users and
> applications define tables according to their "needs" as they see
> them. Surely there's no more harm in that than in providing only an
> EAV and foreclosing any opportunity for database management.

Yes, but even in an organisation trying do do it properly, and with
someone in charge of logical design who was both competent and
reasonable, I have seen a case of 2 - maybe it was just out of habit,
but still...

James K. Lowden

unread,

Feb 8, 2015, 6:59:04 PM2/8/15

to

On Sun, 8 Feb 2015 12:44:20 +0000
Eric <er...@deptj.eu> wrote:

> > I think what he means is really unanalyzed or poorly organized
> > data. I think that's a better term because it puts the emphaisis
> > where it belongs: not on the "nature" of the data, but on the
> > undone work of preparing them for analysis.
>
> Perhaps the term should be "non-data", and I am thinking of a picture,
> or a poem, or a newspaper article which, these days, are stored in a
> computer system. Of course there will be meta-data for them which,
> from out point of view, is data and should be properly structured.
> But how do we say, in our system, that a picture is linked to a poem
> because it is claimed to be the inspiration for the poem. This pretty
> much has to be a generic link because you can't predict what link
> descriptions people will want, so a schema handling this is going to
> look like EAV, isn't it?

Not at all. You're describing three tables: poems, pictures, and
inspirations. Poems and Pictures are N:M, and inspirations hold keys
to both.

But you know I'm going to shoot down every proposal, right? Because
the answer is always per above, unravelling the EAV rows to define
a table structure. :-)

> even in an organisation trying do do it properly, and with
> someone in charge of logical design who was both competent and
> reasonable, I have seen a case of 2 - maybe it was just out of habit,
> but still...

I'll give you a better example from those who we have every reason to
believe know better. Microsoft SQL Server supports "extended
properties" on database objects, cf.
"sp_addextendedproperty" (https://msdn.microsoft.com/en-us/library/ms180047.aspx).
Note the @value parameter is "sql_variant". These properties are
nothing more nor less than a list of name-value pairs associated with
the object. I say "associated" advisedly; they aren't part of the
schema defintion insofar as they do *not* appear as columns in the
catalog.

Many times over the years I wished for a data dictionary that was an
intrinsic part of the DBMS. I wanted to define provenance, ownership,
administrative responsibility, application use, textual (or
mathematical) description, etc. I wanted to be able to verify
completeness and enforce dictionary requirements as part of database
administration, to be able to say, e.g., how many columns lacked a
description. I thought it would be useful for developers to be able to
scan all the descriptions for, say, "trade date" without regard to the
column's name.

SQL Server's extended properties seem at first blush to be the obvious
place to house a data dictionary. Except that one quickly notices they
are pointlessly orthogonal to the schema. Instead of a powerful way to
extend the system catalog to add meaning and enforce policy, we
get a list of nearly invisible name-value pairs. A repeating group right
there in the catalog. Brilliant. Thanks.

I like to imagine the feature-design group at Microsoft discussing my
obvious proposal and deciding on your reason #1: no, we can't have
ordinary users mucking with the system catalog. Too powerful, could
have unforeseen consequences. Just let them attach a little EAV to each
thing. It satisfies the request to be able to extend the catalog, but
not in a way that could interfere with its proper use. And, you know,
it's like totally flexible.

--jkl

Erwin

unread,

Feb 9, 2015, 5:08:01 AM2/9/15

to

Op maandag 9 februari 2015 00:59:04 UTC+1 schreef James K. Lowden:
>
> ... I thought it would be useful for developers to be able to

> scan all the descriptions for, say, "trade date" without regard to the
> column's name.
>

> --jkl

Hihi.

In SIRA_PRISE :

RESTRICT(RELVAR, MATCHES(RELVARPREDICATE,STRING(.*trade date.*))

Besides. Doesn't recent SQL have a COMMENT clause (that itself gets recorded too and is therefore searchable) for every single catalog object affected by any piece of DDL (at least for the components of a logical model) ?

Erwin

unread,

Feb 9, 2015, 1:52:55 PM2/9/15

to

Op maandag 9 februari 2015 11:08:01 UTC+1 schreef Erwin:

> Op maandag 9 februari 2015 00:59:04 UTC+1 schreef James K. Lowden:
> >
> > ... I thought it would be useful for developers to be able to
> > scan all the descriptions for, say, "trade date" without regard to the
> > column's name.
> >
> > --jkl
>
> Hihi.
>
> In SIRA_PRISE :
>
> RESTRICT(RELVAR, MATCHES(RELVARPREDICATE,STRING(.*trade date.*))

typo

RESTRICT(RELVAR, MATCHES(RELVARPREDICATE,STRING(.*trade date.*)))

Erwin

unread,

Feb 9, 2015, 2:05:09 PM2/9/15

to

Op donderdag 5 februari 2015 22:50:44 UTC+1 schreef Eric:

The "right" way to do 1) is through dynamic DDL.

The end-user creating the "end-user entity" leads to a CREATE TABLE.
The end-user adding a new attribute to his "end-user entity" leads to an ALTER TABLE.
Screens displaying the information in this user-table get the meta-information they need by reading the catalog.
Ditto for screens/programs editing the information. Protection against injection may be a major concern here.
The security system must facilitate end-users being granted DBA privileges (which is logical because that's the job they're doing anyway), possibly in a selective fashion.

But the advantage is that any user with access to this database can then use any regular SQL tool (reporting, DWH extraction suites, ...) to access the end-user tables just like any of the other ones.

Eric

unread,

Feb 9, 2015, 4:40:04 PM2/9/15

to

That probably is the right way.

Eric

unread,

Feb 9, 2015, 4:40:05 PM2/9/15

to

On 2015-02-08, James K. Lowden <jklo...@speakeasy.net> wrote:
> On Sun, 8 Feb 2015 12:44:20 +0000
> Eric <er...@deptj.eu> wrote:
>
>> > I think what he means is really unanalyzed or poorly organized
>> > data. I think that's a better term because it puts the emphaisis
>> > where it belongs: not on the "nature" of the data, but on the
>> > undone work of preparing them for analysis.
>>
>> Perhaps the term should be "non-data", and I am thinking of a picture,
>> or a poem, or a newspaper article which, these days, are stored in a
>> computer system. Of course there will be meta-data for them which,
>> from out point of view, is data and should be properly structured.
>> But how do we say, in our system, that a picture is linked to a poem
>> because it is claimed to be the inspiration for the poem. This pretty
>> much has to be a generic link because you can't predict what link
>> descriptions people will want, so a schema handling this is going to
>> look like EAV, isn't it?
>
> Not at all. You're describing three tables: poems, pictures, and
> inspirations. Poems and Pictures are N:M, and inspirations hold keys
> to both.
>
> But you know I'm going to shoot down every proposal, right? Because
> the answer is always per above, unravelling the EAV rows to define
> a table structure. :-)

Of course you will, for any specific case I give you. But what if the
number of possible links is potentially very large, because end-users
can just create them at will between any two blobby things they have
found?

I had to look that one up - aaaaargh!