SQL Select Problem

Frank Uray

unread,

Dec 21, 2009, 10:35:01 AM12/21/09

to

Hi all

I have a little problem with a select.
I would need to have the most simple solution :-)

I have the following table:
[FK_Candidate], [Key], [Value]
1 'Profession' 'Informatiker'
2 'Profession' 'Informatiker'
1 'Source' 'Quelle1'
2 'Source' 'Quelle2'

I need to get all candidates with
Profession = Informatiker AND Source = Quelle1

The result should look like this:
1 'Profession' 'Informatiker'
1 'Source' 'Quelle1'

Any ideas ?
Thanks a lot for any help !

Best regards
Frank Uray

Script for testing:
CREATE TABLE #test ([FK_Candidate] int, [Key] varchar(50), [Value]
varchar(50))
INSERT INTO #test SELECT 1, 'Profession', 'Informatiker'
INSERT INTO #test SELECT 2, 'Profession', 'Informatiker'
INSERT INTO #test SELECT 1, 'Source', 'Quelle1'
INSERT INTO #test SELECT 2, 'Source', 'Quelle2'

Stefan Hoffmann

unread,

Dec 21, 2009, 10:44:16 AM12/21/09

to

hi Frank,

On 21.12.2009 16:35, Frank Uray wrote:
> I need to get all candidates with
> Profession = Informatiker AND Source = Quelle1
>
> The result should look like this:
> 1 'Profession' 'Informatiker'
> 1 'Source' 'Quelle1'

E.g.:

SELECT *
FROM #test
WHERE [FK_Candidate] IN
(
SELECT [FK_Candidate]
FROM #test
WHERE [Key] = 'Profession' AND [Value] = 'Informatiker'
INTERSECT
SELECT [FK_Candidate]
FROM #test
WHERE [Key] = 'Source' AND [Value] = 'Quelle1'

)

mfG
--> stefan <--

Plamen Ratchev

unread,

Dec 21, 2009, 1:50:08 PM12/21/09

to

Here is another method:

SELECT A.FK_Candidate, A.[Key], A.[Value]
FROM #test AS A
WHERE EXISTS(SELECT *
FROM #test AS B
WHERE B.FK_Candidate = A.FK_Candidate
AND B.[Key] = 'Profession'
AND B.[Value] = 'Informatiker')
AND EXISTS(SELECT *
FROM #test AS C
WHERE C.FK_Candidate = A.FK_Candidate
AND C.[Key] = 'Source'
AND C.[Value] = 'Quelle1');

--
Plamen Ratchev
http://www.SQLStudio.com

Frank Uray

unread,

Dec 21, 2009, 2:01:01 PM12/21/09

to

Hi Stefan

Thank you for your answer.

If there are only two keys, it works.
But when you add
INSERT INTO #test SELECT 1, 'TestKey', 'TK1'
INSERT INTO #test SELECT 2, 'TestKey', 'TK2'
it does not work any more.

Do you have another idea ?

Thanks and best regards
Frank Uray

"Stefan Hoffmann" wrote:

> .
>

Gert-Jan Strik

unread,

Dec 21, 2009, 3:34:44 PM12/21/09

to

Frank,

I think the problem is not the SELECT, but the table design. Is there
any particular reason why the design is not normalized? If the
information belong together, and belongs to the same object, then it
could be in the same row, instead of using the "flexible" key/value
storage.

The key/value storage gives you challenges like the query you are after
(which requires "relational division", google that and you will find
good solutions). Also, it makes it very hard to implement any kind of
constraints on the data integrity, such as foreign keys, strong data
typing, domain checks, etc. And finally, from a performance point of
view, you might become very disappointed, because the optimizer is not
particularly well build to optimizer these nonstandard situations.

--
Gert-Jan
SQL Server MVP

Frank Uray

unread,

Dec 21, 2009, 5:57:01 PM12/21/09

to

Hi Plamen

Thanks a lot for your answer.

I have implemented your solution and it works fine :-))

Best regards
Frank Uray

"Plamen Ratchev" wrote:

> .
>

Frank Uray

unread,

Dec 21, 2009, 6:00:01 PM12/21/09

to

Hi Gert-Jan

Thanks a lot for your answer.

My datamodel is normalized, the Key and the Value are
in different tables with relation and constraints.
I have simplified it just for testing.

I have implemented the following solution now.

Thanks and best regards
Frank Uray

CREATE TABLE #test ([FK_Candidate] int, [Key] varchar(50), [Value]

varchar(50))
INSERT INTO #test SELECT 1, 'Profession', 'Informatiker'
INSERT INTO #test SELECT 2, 'Profession', 'Informatiker'
INSERT INTO #test SELECT 1, 'Source', 'Quelle1'
INSERT INTO #test SELECT 2, 'Source', 'Quelle2'

INSERT INTO #test SELECT 1, 'TestKey', 'X'
INSERT INTO #test SELECT 2, 'TestKey', 'Y'
INSERT INTO #test SELECT 1, 'Title', 'Mr'
INSERT INTO #test SELECT 2, 'Title', 'Mrs'

SELECT DISTINCT [FK_Candidate]
FROM #test q

WHERE EXISTS(SELECT *
FROM #test

WHERE [FK_Candidate] = q.[FK_Candidate]
AND [Key] = 'Profession'
AND [Value] = 'Informatiker')

AND EXISTS(SELECT *
FROM #test

WHERE [FK_Candidate] = q.[FK_Candidate]
AND [Key] = 'Source'
AND [Value] = 'Quelle1')

AND EXISTS(SELECT *
FROM #test

WHERE [FK_Candidate] = q.[FK_Candidate]
AND [Key] = 'Title'
AND [Value] = 'Mr')

"Gert-Jan Strik" wrote:

> .
>

Plamen Ratchev

unread,

Dec 21, 2009, 6:37:14 PM12/21/09

to

Another solution you may want to consider is pivoting (but it will return the data in different format):

SELECT FK_Candidate, profession, [source]
FROM (
SELECT FK_Candidate,
MAX(CASE WHEN [Key] = 'Profession' THEN [Value] END) AS profession,
MAX(CASE WHEN [Key] = 'Source' THEN [Value] END) AS [source]
FROM #test
GROUP BY FK_Candidate) AS T
WHERE profession = 'Informatiker'
AND [source] = 'Quelle1';

Frank Uray

unread,

Dec 21, 2009, 7:26:01 PM12/21/09

to

Hi Plamen

I have tried pivoting already,
it works also but I am generating the SQL Statement
dynamicly in C# and with the EXISTS it is much easyer to create.

But thanks anyway !

Regards
Frank Uray

SELECT *
FROM #test
PIVOT (MAX([Value])
FOR [Key] IN ([Profession],[Source])) p
WHERE [Profession] = 'Informatiker'
AND [Source] = 'Quelle1'

"Plamen Ratchev" wrote:

> .
>

Stefan Hoffmann

unread,

Dec 22, 2009, 5:15:12 AM12/22/09

to

hi Frank,

On 21.12.2009 20:01, Frank Uray wrote:
> If there are only two keys, it works.
> But when you add
> INSERT INTO #test SELECT 1, 'TestKey', 'TK1'
> INSERT INTO #test SELECT 2, 'TestKey', 'TK2'
> it does not work any more.
>
> Do you have another idea ?

Not sure that I understand your problem here. If you need the only the
key-value pairs then simply filter the result:

WITH CTE
AS ( SELECT *
FROM #test
WHERE ( [Key] = 'Profession'
AND [Value] = 'Informatiker'
)
OR ( [Key] = 'Source'

AND [Value] = 'Quelle1'
)

)
SELECT *
FROM CTE

WHERE [FK_Candidate] IN ( SELECT [FK_Candidate]

FROM CTE

WHERE [Key] = 'Profession'
AND [Value] = 'Informatiker'
INTERSECT
SELECT [FK_Candidate]

FROM CTE
WHERE [Key] = 'Source'
AND [Value] = 'Quelle1' ) ;

Using a CTE for filtering has the better execution plan.

mfG
--> stefan <--

Gert-Jan Strik

unread,

Dec 22, 2009, 5:56:41 AM12/22/09

to

Frank Uray wrote:
> My datamodel is normalized, the Key and the Value are
> in different tables with relation and constraints.
> I have simplified it just for testing.

O boy. If I understand you correctly, you are saying that you have a
table called Keys in which you store the names of the keys (such as
"Profession") and a table called Values in which you store one or more
values of this key for the Candidate (such as "Informatiker").

If that is the case, then your model is not normalized, and it is (as
David noted) the EAV anti-pattern.

You say you have relations and constraints. So tell me, how do you
prevent someone from entering a "Title" with value "Informatiker" in the
database? Or let's say you also have a key called "DateOfBirth". How do
you prevent values like "2E3" and "1st of January 2 thousand" for this
key?

Or let's say that a Candidate should at least have a "Title" and a
"Profession". How would you prevent Candidates with just a "Title", or
just a "Profession"?

What happens if someone changes the Key's name from "Profession" to
"Education"?

I already know your answer! Because with your current design, you cannot
(easily) solve this in the database. So it must be solved in the
application layer. All the unwanted situations that you allow in the
database become a burdon for the developer who will then have to write
complex queries to filter out any incomplete entities (in your case
Candidates). The developer will have to code and manage the business
rules, even the business rules about what constitutes a "Candicate".

After a short while, this approach will likely fail. A developer will
"forget" some of the rules, or someone will change data in the database
directly (without using your application), etc. If something goes wrong,
it might be very difficult to determine what data is correct and what
isn't. It is very easy with this model to create (or end up in) a
nightmare scenario.

If you still think the Entity-Attribute-Value approach is the right one
for you, then I wish you good luck.

If you realize the potential problems and want to avoid it, then it
would be best to redesign the database. If that is beyond your control,
then you could build views to pivot the unnormalized data to a
normalized set. In your case, the view could look something like this
(partially copied from Plamen). You want to list all the columns in this
view that belong to a Candidate.

CREATE VIEW Candidate AS

SELECT FK_Candidate,
MAX(CASE WHEN [Key] = 'Profession' THEN [Value] END) AS
profession,
MAX(CASE WHEN [Key] = 'Source' THEN [Value] END) AS [source]

FROM Keys JOIN Values ON Keys.key_id = Values.key_id
GROUP BY FK_Candidate

After that, you at least make it a bit easier for you developer to write
queries, because it allows simple queries again, like

SELECT FK_Candidate
FROM Candidate

WHERE profession = 'Informatiker'
AND "source" = 'Quelle1'

--
Gert-Jan
SQL Server MVP