using SPSS with a relational database

Jonathan LaRosa

unread,

Nov 1, 2004, 4:31:46 PM11/1/04

to

Has anyone out there used SPSS to analyze data from a relational
database?

I have developed a SQL Server database for a client that involves, as
one would guess, a whole handful of one-to-many relationships between
tables. At the heart of this database is the "Client". Basically
everything has a relationship to the "Client" table somehow (more or
less). The client that I have developed the database for has someone
working in partnership with them that wants to use SPSS to analyze the
data. Seems straightforward - or so I thought.

The problem is that the person wanting to use SPSS is demanding that
all the data be in one row per "Client". So, for example, there is
the "Client" table and there is the "Children" table. Obviously,
there is a one-to-many relationship between the "Client" and
"Children" table. So a query/view incorporating both these tables
will necessarily have "repeats" in the "ClientID" field where that
"ClientID" has multiple entries in the "Children" table.

The SPSS person claims that when she uses SPSS, the data she has used
has always fit this requirement of one row per primary data object.
In the example above, this means she would have data that, instead of
using a "Client" table a "Children" table, would have a fixed number
of columns, say 4. Each of these fields would store something like a
child's gender (for up to 4 children) for that one client. Clearly
terrible database design, but good for statistics, according to the
SPSS person.

So has anyone come across this problem before?
I feel like a major stats package like SPSS should be able to handle
this situation. I just don't know anything about SPSS so I don't know
how it would deal with data that is in a one-to-many format.

Is this a lack-of-training issue for the person using SPSS?

Any thoughts are greatly appreciated.

best,
jon
jlarosa at alumni dot brown dot edu

Moshe Braner

unread,

Nov 2, 2004, 10:07:26 AM11/2/04

to

On 1 Nov 2004 13:31:46 -0800, Jonathan LaRosa <jla...@alumni.brown.edu>
wrote:

> Has anyone out there used SPSS to analyze data from a relational
> database?

I do that all the time...

> I have developed a SQL Server database for a client that involves, as
> one would guess, a whole handful of one-to-many relationships between

> tables. At the heart of this database is the "Client". ...

>
> The problem is that the person wanting to use SPSS is demanding that
> all the data be in one row per "Client". So, for example, there is
> the "Client" table and there is the "Children" table. Obviously,
> there is a one-to-many relationship between the "Client" and
> "Children" table. So a query/view incorporating both these tables
> will necessarily have "repeats" in the "ClientID" field where that
> "ClientID" has multiple entries in the "Children" table.
>
> The SPSS person claims that when she uses SPSS, the data she has used
> has always fit this requirement of one row per primary data object.
> In the example above, this means she would have data that, instead of
> using a "Client" table a "Children" table, would have a fixed number
> of columns, say 4. Each of these fields would store something like a
> child's gender (for up to 4 children) for that one client. Clearly
> terrible database design, but good for statistics, according to the
> SPSS person.

This is not an SPSS problem per se, but a problem in the way one is
thinking
about the data as a whole. There is no ONE WAY to analyze everything in
such
a situation, it all depends on what you are trying to do. E.g., if you
want
to report on how many children the clients have, then limiting the data to
a maximum of 4 children per client will give the wrong results. Instead,
one
can do a query that returns one record per child, as many records per
client
as needed, and then AGGREGATE (that's an SPSS operation) to one record per
client, counting the number of children for each client. Like this (in
syntax):

sort cases by clientid. /* if not sorted yet.
aggregate outfile=* /presorted /break=clientid /nchild=N.

That can then be followed by statements that display statistics on the
NCHILD
variable, e.g., means or frequencies.

Other questions require other approaches. E.g., if you want to report on a
characteristic that is child-based, you need to keep it as one record per
child.

Note that multiple queries on the database can be done, there is not need
to
try and do everything all at once. And whatever SQL queries you need to do
(including complex ones that relate multiple tables) can be done from SPSS
via ODBC, the SQL syntax is embedded in the SPSS ODBC query statement.

- Moshe

Juanan

unread,

Nov 2, 2004, 10:32:45 AM11/2/04

to

"Jonathan LaRosa" <jla...@alumni.brown.edu> escribió en el mensaje

>
> Is this a lack-of-training issue for the person using SPSS?

You are right. I'm using SPSS to analyze data from a relational database
with no problem to cope with the structure your are describing. AGGREGATE
function can do its best with this type of data.!

J.A.T.

Neila Nessa

unread,

Nov 2, 2004, 2:00:57 PM11/2/04

to

> Is this a lack-of-training issue for the person using SPSS?

Hi,
I suspect so ;-)
Anyway, look at the Syntax manual under VECTOR. You will see
a means of transforming multiple rows to one row via AGGREGATE.
Also look at the data restructuring wizzer. May be helpful
tho I prefer the old style syntax rather than rigid wizzers.
Neila

jla...@alumni.brown.edu (Jonathan LaRosa) wrote in message news:<e6a1ce64.04110...@posting.google.com>...

matt...@gmail.com

unread,

Dec 7, 2004, 10:06:50 AM12/7/04

to

I use relational databases with spss, however, I have a different
solution to what has been offerred.

Though agregates can solve some problems, there are other ways to do
things.

We work with survey data (questionnaires) which, if put into a
relational structure, would take up many tables. I understand your
analist's problem in that for them analysis occurs per 'case' - this is
primarily a structure problem.

For survey data (household rosters, basket of goods, etc.) flat files
work well.

Structurally, you need to know what the maximum count per case is of
repeating data (e.g. maximum 4 children per respondent). In this case,
you would make four 'child' variables in which the data for that child
is stored (e.g. child0_age, child1_age, etc.). Now you must make the
four variables into a multiple response set (e.g. mr_children_age).
This enables you to use the new variable singly and holds great
analysis benefits, especially when things start to get complex...

I used to run a lot of queries from spss, but ended up taking the time
to provide a single 'flat file' to analysts as this was easily
accessible to a wider audience.

As a database user, I would appreciate spss giving an option of storing
data directly in a database and not in a file...

Matty

neila...@msn.com

unread,

Dec 7, 2004, 12:34:51 PM12/7/04

to

"Now you must make the"
"four variables into a multiple response set (e.g. mr_children_age)."

Rubbish!!! MR sets have limited usefulness or applicability.
Unless you are interested in limiting yourself to XTABulations.
Neila