SQL question

Alain Quesnel

unread,

Nov 21, 2009, 9:49:20 AM11/21/09

to

I'm trying to retrieve the establishments that have at least one record with
a value of 1, 2 and 3 in CodeFonctionCategorie. I don't care how many
records there are per establishment. In the example below, I want to
retrieve Establishments 612, 8401, 25538 and 45935. Establishment 3659 only
has records with 2 and 3, and establishment 42650 only has 1 and 2.

Has anyone ever done something similar? I'm trying to avoid the use of a
cursor. I came up with this:

select TMP.CodeEtablissement, count(*) from
(
select distinct CodeEtablissement, CodeFonctionCategorie
from TableXYZ
)
as TMP
group by TMP.CodeEtablissement
having count(*) = 3

but it seems unflexible (what if the 1, 2, 3 criteria changes?). It is
pretty fast, however. It takes less than one second against 45000
establishments, 261 of which have 1, 2 and 3.

Thank you

CodeEtablissement CodeFonctionCategorie
612 1
612 2
612 2
612 2
612 3
3659 2
3659 2
3659 3
3659 3
3659 3
3659 3
8401 1
8401 1
8401 2
8401 2
8401 3
8401 3
25538 1
25538 2
25538 2
25538 2
25538 3
42650 1
42650 2
42650 2
42650 2
42650 2
45935 1
45935 1
45935 2
45935 3
45935 3
45935 3

--

Alain Quesnel
alains...@logiquel.com

www.logiquel.com

Plamen Ratchev

unread,

Nov 21, 2009, 10:14:39 AM11/21/09

to

This problems is called relational division and the solution you have is pretty much a standard way to handle it.

To avoid hard coding values you can create a table to store the desired search codes and then the query will look like this:

CREATE TABLE Codes (
CodeFonctionCategorie INT NOT NULL PRIMARY KEY);

INSERT INTO Codes VALUES(1);
INSERT INTO Codes VALUES(2);
INSERT INTO Codes VALUES(3);

SELECT CodeEtablissement
FROM TableXYZ
WHERE CodeFonctionCategorie IN (SELECT CodeFonctionCategorie
FROM Codes)
GROUP BY CodeEtablissement
HAVING COUNT(DISTINCT CodeFonctionCategorie) = (SELECT COUNT(*) FROM Codes);

--
Plamen Ratchev
http://www.SQLStudio.com

Erland Sommarskog

unread,

Nov 21, 2009, 10:24:06 AM11/21/09

to

Alain Quesnel (alains...@logiquel.com) writes:
> I'm trying to retrieve the establishments that have at least one record
> with a value of 1, 2 and 3 in CodeFonctionCategorie.

That is, the establishment needs to have at least one row of each
value to qualify?

> I don't care how many records there are per establishment. In the
> example below, I want to retrieve Establishments 612, 8401, 25538 and
> 45935. Establishment 3659 only has records with 2 and 3, and
> establishment 42650 only has 1 and 2.
>
> Has anyone ever done something similar? I'm trying to avoid the use of a
> cursor. I came up with this:
>
> select TMP.CodeEtablissement, count(*) from
> (
> select distinct CodeEtablissement, CodeFonctionCategorie
> from TableXYZ
> )
> as TMP
> group by TMP.CodeEtablissement
> having count(*) = 3
>
> but it seems unflexible (what if the 1, 2, 3 criteria changes?). It is
> pretty fast, however. It takes less than one second against 45000
> establishments, 261 of which have 1, 2 and 3.

The above does not even specify the values 1, 2, 3 but will catch
establishments with values 5, 6 and 7 as well. But not if they have
1, 2, 3 and 5. But I assume that 1-3 are the only values in this case.

In any case, you have the basic structure correct. You can make it
shorter with:

SELECT CodeEtablissement, COUNT(DISTINCT CodeFonctionCategorie)
FROM TableXYZ
GROUP BY CodeEtablissement
HAVING COUNT(DISTINCT CodeFonctionCategorie) = 3

I would not expect any difference in performance, though.

For more flexibility:

SELECT CodeEstablishment, COUNT(DISTINCT CodeFonctionCategorie)
FROM TableXYZ t
JOIN @values v ON t.CodeFonctionCategorie = v.value

GROUP BY CodeEtablissement
HAVING COUNT(DISTINCT CodeFonctionCategorie) =

(SELECT COUNT(*) FROM @values)

Where I assume that @values includes the values you are looking for.

--
Erland Sommarskog, SQL Server MVP, esq...@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
SQL 2000: http://www.microsoft.com/sql/prodinfo/previousversions/books.mspx

--CELKO--

unread,

Nov 21, 2009, 11:20:37 AM11/21/09

to

Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, data types, etc. in
your schema are. If you know how, follow ISO-11179 data element naming
conventions and formatting rules. Temporal data should use ISO-8601
formats. Code should be in Standard SQL as much as possible and not
local dialect.

Sample data is also a good idea, along with clear specifications. It
is very hard to debug code when you do not let us see it. If you want
to learn how to ask a question on a Newsgroup, look at:
http://www.catb.org/~esr/faqs/smart-questions.html

The short answer is that this is a relational division problem.
Relational Division was one of Dr. Codd's eight original operators.

>> I'm trying to retrieve the establishments that have at least one record [sic] witha value of 1, 2 and 3 in CodeFonctionCategorie. <<

Rows are not records -- totally different concept. A properly
declared table has a key in its rows. But the sample data you posted
is not a proper table! It has no key and is full of redundant
duplicate rows.

The BETTER answer is what kind of relational division? Plamen gave
you a version of Celko's Relational Division, but there is also Date's
Relational Division:

WITH Divisor (CodeFonctionCategorie)
AS
(SELECT CodeFonctionCategorie
FROM (VALUES (1), (2), (3)) -- add search values here.
AS X (CodeFonctionCategorie)

SELECT DISTINCT CodeEtablissement
FROM PseudoTable AS PS1
WHERE NOT EXISTS
(SELECT *
FROM Divisor
WHERE NOT EXISTS
(SELECT *
FROM PseudoTableAS PS2
WHERE (PS1.CodeEtablissement
= PS2.CodeEtablissement)
AND (PS2.CodeFonctionCategorie
= Divisor.CodeFonctionCategorie)));

The Winter 1996 edition of DB2 ON-LINE MAGAZINE (http://www.db2mag.com/
96011ar:htm) had an article entitled "Powerful SQL:Beyond the Basics"
by Sheryl Larsen which gave the results of testing both methods. Her
conclusion for DB2 was that the nested EXISTS() version is better when
the quotient has less than 25% of the dividend table's rows and the
COUNT(*) version is better when the quotient is more than 25% of the
dividend table.

But you need to test for yourself.

alain

unread,

Nov 26, 2009, 10:04:43 AM11/26/09

to

Thank you all for your replies.

Based on your answers (and one from the XML group), I came up with a
solution that's flexible and fast. If the list of function categories
ever changes and/or grows in size, all I need to do is feed a
different XML fragment to my function. The DDL for the tables is
posted further down. You can download a TEST DB here:
http://www.logiquel.com/downloads/SQL/TEST_RGN.zip . I tested the
update with 49000 establishments, and it takes less than a second. The
script will probably be converted into a stored procedure and then run
from a scheduled task.

create function dbo.GetCategories(@inXML xml)
returns @table table(CodeFonctionCategorie int primary key)
as
begin
insert into @table
SELECT List.CodeFonctionCategorie.value('.', 'int') AS Cat
from @inXml.nodes('(/ROOT/CodeFonctionCategorie)') AS List
(CodeFonctionCategorie)

RETURN
end

GO

declare @xml xml
set @xml =
N'<ROOT>
<CodeFonctionCategorie>1</CodeFonctionCategorie>
<CodeFonctionCategorie>2</CodeFonctionCategorie>
<CodeFonctionCategorie>3</CodeFonctionCategorie>
</ROOT>'

update dbo.RGN_Etablissements
set EquipeComplete = 1
where CodeEtablissement in
(
SELECT CodeEtablissement
FROM RGN_Etablissements_IntervenantsResponsables EI
inner join RGN_Intervenants I on
I.CodeIntervenant = EI.CodeIntervenant
inner join RGN_Fonctions_Intervenants FI on
FI.CodeFonctionIntervenant = I.CodeFonctionIntervenant
WHERE CodeFonctionCategorie IN (select CodeFonctionCategorie from
dbo.GetCategories(@xml))

GROUP BY CodeEtablissement
HAVING COUNT(DISTINCT CodeFonctionCategorie) = (SELECT COUNT(*) FROM

dbo.GetCategories(@xml))
)

--DDL:
CREATE TABLE dbo.RGN_Fonctions_Categories(
CodeFonctionCategorie int PRIMARY KEY IDENTITY NOT NULL,
FonctionCategorieDescription nvarchar(50) NOT NULL)
GO
CREATE TABLE dbo.RGN_Etablissements(
CodeEtablissement int PRIMARY KEY IDENTITY NOT NULL,
NomEtablissement nvarchar(75) NOT NULL,
EquipeComplete bit NULL)
GO
CREATE TABLE dbo.RGN_Fonctions_Intervenants(
CodeFonctionIntervenant int PRIMARY KEY IDENTITY NOT NULL,
CodeFonctionCategorie int NOT NULL,
FonctionDescription nvarchar(50) NOT NULL)
GO
CREATE TABLE dbo.RGN_Intervenants(
CodeIntervenant int PRIMARY KEY IDENTITY NOT NULL,
CodeFonctionIntervenant int NOT NULL)
GO
CREATE TABLE dbo.RGN_Etablissements_IntervenantsResponsables(
CodeIntervenantResponsable int PRIMARY KEY IDENTITY NOT NULL,
CodeEtablissement int NOT NULL,
CodeIntervenant int NOT NULL)
GO
ALTER TABLE dbo.RGN_Fonctions_Intervenants WITH CHECK ADD CONSTRAINT
FK_RGN_Fonctions_Intervenants_RGN_Fonctions_Categories FOREIGN KEY
(CodeFonctionCategorie)
REFERENCES dbo.RGN_Fonctions_Categories (CodeFonctionCategorie)
GO
ALTER TABLE dbo.RGN_Intervenants WITH CHECK ADD CONSTRAINT
FK_RGN_Intervenants_RGN_Fonctions_Intervenants FOREIGN KEY
(CodeFonctionIntervenant)
REFERENCES dbo.RGN_Fonctions_Intervenants (CodeFonctionIntervenant)
GO
ALTER TABLE dbo.RGN_Etablissements_IntervenantsResponsables WITH
CHECK ADD CONSTRAINT
FK_RGN_Etablissements_IntervenantsResponsables_RGN_Etablissements
FOREIGN KEY(CodeEtablissement)
REFERENCES dbo.RGN_Etablissements (CodeEtablissement)
GO
ALTER TABLE dbo.RGN_Etablissements_IntervenantsResponsables WITH
CHECK ADD CONSTRAINT
FK_RGN_Etablissements_IntervenantsResponsables_RGN_Intervenants
FOREIGN KEY(CodeIntervenant)
REFERENCES dbo.RGN_Intervenants (CodeIntervenant)
GO

Alain Quesnel

unread,

Nov 26, 2009, 10:27:33 AM11/26/09

to

I replied through Google Groups (
http://groups.google.com/group/microsoft.public.sqlserver.programming/topics
). When I post and/or reply through Windows Live Mail, my posts don't show
up. Also, in Google Groups I could see three replies to my original posts
(including yours). In Windows Live Mail, I can only see your reply. I don't
see my original post, or the other two replies.

If I look at
http://www.microsoft.com/communities/newsgroups/list/en-us/default.aspx?pg=3&cat=&lang=&cr=&guid=&sloc=en-us&dg=microsoft.public.sqlserver.programming&fltr=
I only see my original post and your reply.

Do you know why that is?

--
Thank you,

Alain Quesnel
alains...@logiquel.com
www.logiquel.com

"--CELKO--" <jcel...@earthlink.net> wrote in message
news:df1e776a-ace1-4289...@d21g2000yqn.googlegroups.com...