I have three tables: Members, Fruits, MemberFruits.
Each member may have more than 1 favorite fruits.
Using a set based query, I can only get this:
MemberID MemberName FruitName
-------------- --------------------- --------------
1 Al Apple
1 Al Cherry
1 Al Fig
1 Al Grape
But, I would like to concatenate the fruit names and stick it into
once cell for each member, like what's shown below, this only makes my
life a little easier when I need to bind the result to a asp.net
server control.
MemberID MemberName FruitName
----------- -------------- ------------
1 Al Apple, Cherry, Fig, Grap
Here goes the DDL and thanks a lot.
CREATE TABLE [dbo].[Members](
[MemberID] [int] NOT NULL,
[Name] [varchar](32) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_Members] PRIMARY KEY CLUSTERED
(
[MemberID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Go
CREATE TABLE [dbo].[Fruits](
[FruitID] [int] NOT NULL,
[Name] [varchar](32) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
CONSTRAINT [PK_Fruits] PRIMARY KEY CLUSTERED
(
[FruitID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
Go
CREATE TABLE [dbo].[MemberFruits](
[MemberID] [int] NOT NULL,
[FruitID] [int] NOT NULL,
CONSTRAINT [PK_MemberFruits] PRIMARY KEY CLUSTERED
(
[MemberID] ASC,
[FruitID] ASC
)WITH (IGNORE_DUP_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
ALTER TABLE [dbo].[MemberFruits] WITH CHECK ADD CONSTRAINT
[FK_MemberFruits_Fruits] FOREIGN KEY([FruitID])
REFERENCES [dbo].[Fruits] ([FruitID])
GO
ALTER TABLE [dbo].[MemberFruits] WITH CHECK ADD CONSTRAINT
[FK_MemberFruits_Members] FOREIGN KEY([MemberID])
REFERENCES [dbo].[Members] ([MemberID])
INSERT INTO Members VALUES(1, 'Al')
INSERT INTO Members VALUES(2, 'Betty')
INSERT INTO Members VALUES(3, 'Chris')
INSERT INTO Members VALUES(4, 'David')
INSERT INTO Members VALUES(5, 'Erin')
INSERT INTO Members VALUES(6, 'Frank')
INSERT INTO Members VALUES(7, 'Gary')
INSERT INTO Members VALUES(8, 'Henry')
INSERT INTO Members VALUES(9, 'Ian')
INSERT INTO Members VALUES(10, 'Jerry')
INSERT INTO Fruits VALUES(1, 'Apple')
INSERT INTO Fruits VALUES(2, 'Banana')
INSERT INTO Fruits VALUES(3, 'Cherry')
INSERT INTO Fruits VALUES(4, 'Date')
INSERT INTO Fruits VALUES(5, 'Eggfruit')
INSERT INTO Fruits VALUES(6, 'Fig')
INSERT INTO Fruits VALUES(7, 'Grape')
INSERT INTO MemberFruits VALUES(1, 1)
INSERT INTO MemberFruits VALUES(1, 3)
INSERT INTO MemberFruits VALUES(1, 6)
INSERT INTO MemberFruits VALUES(1, 7)
INSERT INTO MemberFruits VALUES(2, 2)
INSERT INTO MemberFruits VALUES(2, 5)
INSERT INTO MemberFruits VALUES(2, 7)
INSERT INTO MemberFruits VALUES(3, 2)
INSERT INTO MemberFruits VALUES(3, 4)
INSERT INTO MemberFruits VALUES(3, 6)
INSERT INTO MemberFruits VALUES(3, 7)
INSERT INTO MemberFruits VALUES(4, 3)
INSERT INTO MemberFruits VALUES(4, 4)
-- The set-based method
select mf.MemberID, m.[Name] as MemberName, mf.FruitID, f.[Name] as
FruitName
from MemberFruits mf
inner join Members m on mf.MemberID = m.MemberID
inner join Fruits f on mf.FruitID = f.FruitID
SELECT M.MemberID, M.Name,
STUFF(T.fruit_list, 1, 1, '') AS member_fruit_list
FROM Members AS M
CROSS APPLY(SELECT ',' + F.Name
FROM MemberFruits AS MF
JOIN Members AS M1
ON MF.MemberID = M1.MemberID
JOIN Fruits AS F
ON MF.FruitID = F.FruitID
WHERE M1.MemberID = M.MemberID
ORDER BY F.Name
FOR XML PATH('')) AS T(fruit_list);
--
Plamen Ratchev
http://www.SQLStudio.com
Why do you wish to destroy First Normal Form (1NF) with a concatenated
list structure? Normal forms are the foundation of RDBMS after all.
Why are you formatting data in the back end? The basic principle of a
tiered architecture is that display is done in the front end and never
in the back end. This is a more basic programming principle than just
SQL and RDBMS.
Get a reporting tool so you can do a sort for your lists and lay them
out on paper or a screen in a way that a human being can read them.
Yeah, I don't have to do this and I was aware of these potential
questions even before I asked here. It only makes it easier to bind
the result to an asp.net server control, for example, a GridView.
Otherwise, I think I have to either manually iterate through the
result and format the layout in my webform (a lot of coding) or bind
it to two controls from two returned tables.
I understand that formating is supposed to be done in the front end,
but I quite often do format data in the back end so that I don't have
to do anything in the front. For example, if I want to display dates
as MMM dd, yyyy, I use the sql server convert function to format it
instead of doing it in the front. Also, I quite often concatenate
FirstName, MiddleName and LastName to get the full name in my SQL
query instead of concatenating them in the front. I am not sure if
this is quite common among SQL gurus.
The example I had for my question is faked, what I was really working
on is a video tape management system. Each tape may have 1 to 4
languages. Instead of doing the following and formatting it in the
front,
TapeID Language
1 English
1 Spanish
1 French
2 English
2 Russian
3 English
3 Arabic
3 Chinese
3 Urdu
I thought it is much easier to get the following and bind it to a
GridView control.
TapeID Language
1 English, Spanish, French
2 English, Russian
3 English, Arabic, Chinese, Urdu
Of course, the assumption is that the number of values to be
concatenated is small, which is true in my case. How many languages
can a tape have?
I would have to ask the guys a OLAC (OnLine Audiovisual Catalogers),
but I seem to remember that Disney movies had 30+ languages.
Probably not on a single tape? I would think it is too much otherwise.
Canned answer: when you order a result set, you also "violate" some
fundamental principle. When you join parent and child tables, you
"violate" 2NF. Your own books have such "violations" all over the
place.
You yourself do it all the time.
You order a result set in a cursor and NOT in the database. If you
could do this in the schema, then the data would not be in a set or
multi-set. A lot of newbie without a math background think that an
ORDER BY clause is part of SELECT, but it is not. Read the SQL
Standards. What you get in many products is an implicit cursor for
display via QA or whatever tool you are using.
One of the jokes we had on the committee was that SQL really stands
for "Scarely Qualifies as a Language" because of the lack of I/O.
>> When you join parent [sic: referenced table?] and child [sic: referencing table?] tables, you "violate" 2NF. Your own books have such "violations" all over the place. <<
Besides getting the terms wrong, you have some conceptual errors.
Tables can be normalized, not queries, not cursors. The purpose of
the Normal Forms is to prevent certain data anomalies in the schema.
There are also non-Normal Form data anomalies possible (read Tom
Johnston's articles) and you use DRI to prevent some of them. Then
you use ASSERTIONs and other CHECK() constraints for other kinds of
data integrity.
Obviously, a query result cannot be lower than 1NF in RDBMS; we don't
recognize a "Zero Normal Form" (0NF) :)
But there is also a practical reason. The host program in ANY unknown
language must be able to accept the cursor's records (they are now
records with fields in host language data types in the host program
and not rows in the DB in SQL data types!). What is the one common
denominator among all procedural, OO and functional programming
languages? Sequential flat files with scalar values in the fields.
No special parsing, no lists, no arrays, no fancy stuff.
If you find an error in one of the books, drop me an email; I try to
get corrections in the working drafts of the next editions as I go
along. Right now, I am trying to bring old data element names into
lines with ISO-11179 and add some of the new features.
That is exactly my point. Concatenating a list in a result set cannot
destroy 1NF, because result sets cannot be normalized. So your
previous remark that "Why do you wish to destroy First Normal Form
(1NF) with a concatenated list structure?" is irrelevant.
So if I use "SELECT * FROM NormalizedTable; " the data from the query
will not be normalized even if the base table is? That makes no
sense. Normalization is a series of abstract properties that applies
to files, network databases, etc. based on keys
>> So your previous remark that "Why do you wish to destroy First Normal Form (1NF) with a concatenated list structure?" is irrelevant. <<
All I said was not to do it in the DB tier of the system; get a report
writer for data display like that.
Normalization does not apply to result sets. The following resutl set
does not have a key, and it "violates" 2NF:
Select Customers.LastName, CustomerOrders.OrderDate FROM ...
so what? Such queries are perfectly normal.
>
> >> So your previous remark that "Why do you wish to destroy First Normal Form (1NF) with a concatenated list structure?" is irrelevant. <<
>
> All I said was not to do it in the DB tier of the system; get a report
> writer for data display like that.
You are making an assumption that there is a report writer, which
might not be true. Another canned answer:
I am still playing with this in order to understand it. I created a
view as this:
Create View vMemberFruits
as
select
mf.MemberID,
m.[Name] as MemberName,
mf.FruitID,
f.[Name] as FruitName
from MemberFruits mf
inner join Members m on mf.MemberID = m.MemberID
inner join Fruits f on mf.FruitID = f.FruitID
Then, I mimicked you guys' query and did this:
select
m.MemberID,
m.MemberName,
stuff(t.fruit_list, 1, 1, '') as FruitList
from vMemberFruits as m
cross apply
( select ', ' + m1.FruitName
from vMemberFruits m1
where m.MemberID = m1.MemberID
Order by m.MemberName for xml path('')
)
as T(fruit_list)
The result is correct except that it contains duplicate rows. I know
that if I do SELECT DISTINCT as below, I won't get duplicate records,
but what should I do if I don't want to use DISTINCT? Thank you.
select DISTINCT
m.MemberID,
m.MemberName,
stuff(t.fruit_list, 1, 1, '') as FruitList
from vMemberFruits as m
cross apply
( select ', ' + m1.FruitName
from vMemberFruits m1
where m.MemberID = m1.MemberID
Order by m.MemberName for xml path('')
)
as T(fruit_list)
SELECT m.MemberID,
m.Name,
STUFF(t.fruit_list, 1, 1, '') AS FruitList
FROM Members AS m
CROSS APPLY
( SELECT ', ' + m1.FruitName
FROM vMemberFruits AS m1
WHERE m.MemberID = m1.MemberID
ORDER BY m.Name
FOR XML PATH('')
) AS T(fruit_list)
WHERE fruit_list IS NOT NULL;
It will do because
select m.MemberID,
m.MemberName
from vMemberFruits as m
Has duplicate rows...
If you make your select :
select m.MemberID,
m.MemberName,
stuff(t.fruit_list, 1, 1, '') as FruitList
from Members as m
cross apply
( select ', ' + m1.FruitName
from vMemberFruits m1
where m.MemberID = m1.MemberID
Order by m.MemberName for xml path('')
)
as T(fruit_list)
You should be fine.....
--
--
Dave Ballantyne
http://sqlandthelike.blogspot.com/
Thank you. So, it is essential in this case to cross apply between
the Members table and the View vMemberFruits as Plamen's query also
shows.
Well , yes because you want to show the list of fruits for each member,
not the list of fruits , for the member that this fruit belongs to .....
In a more generic sense , its essential to join/apply at the correct
level , not just in this case :)