I have developed a SQL Server database for a client that involves, as
one would guess, a whole handful of one-to-many relationships between
tables. At the heart of this database is the "Client". Basically
everything has a relationship to the "Client" table somehow (more or
less). The client that I have developed the database for has someone
working in partnership with them that wants to use SPSS to analyze the
data. Seems straightforward - or so I thought.
The problem is that the person wanting to use SPSS is demanding that
all the data be in one row per "Client". So, for example, there is
the "Client" table and there is the "Children" table. Obviously,
there is a one-to-many relationship between the "Client" and
"Children" table. So a query/view incorporating both these tables
will necessarily have "repeats" in the "ClientID" field where that
"ClientID" has multiple entries in the "Children" table.
The SPSS person claims that when she uses SPSS, the data she has used
has always fit this requirement of one row per primary data object.
In the example above, this means she would have data that, instead of
using a "Client" table a "Children" table, would have a fixed number
of columns, say 4. Each of these fields would store something like a
child's gender (for up to 4 children) for that one client. Clearly
terrible database design, but good for statistics, according to the
SPSS person.
So has anyone come across this problem before?
I feel like a major stats package like SPSS should be able to handle
this situation. I just don't know anything about SPSS so I don't know
how it would deal with data that is in a one-to-many format.
Is this a lack-of-training issue for the person using SPSS?
Any thoughts are greatly appreciated.
best,
jon
jlarosa at alumni dot brown dot edu
I do that all the time...
> I have developed a SQL Server database for a client that involves, as
> one would guess, a whole handful of one-to-many relationships between
> tables. At the heart of this database is the "Client". ...
>
> The problem is that the person wanting to use SPSS is demanding that
> all the data be in one row per "Client". So, for example, there is
> the "Client" table and there is the "Children" table. Obviously,
> there is a one-to-many relationship between the "Client" and
> "Children" table. So a query/view incorporating both these tables
> will necessarily have "repeats" in the "ClientID" field where that
> "ClientID" has multiple entries in the "Children" table.
>
> The SPSS person claims that when she uses SPSS, the data she has used
> has always fit this requirement of one row per primary data object.
> In the example above, this means she would have data that, instead of
> using a "Client" table a "Children" table, would have a fixed number
> of columns, say 4. Each of these fields would store something like a
> child's gender (for up to 4 children) for that one client. Clearly
> terrible database design, but good for statistics, according to the
> SPSS person.
This is not an SPSS problem per se, but a problem in the way one is
thinking
about the data as a whole. There is no ONE WAY to analyze everything in
such
a situation, it all depends on what you are trying to do. E.g., if you
want
to report on how many children the clients have, then limiting the data to
a maximum of 4 children per client will give the wrong results. Instead,
one
can do a query that returns one record per child, as many records per
client
as needed, and then AGGREGATE (that's an SPSS operation) to one record per
client, counting the number of children for each client. Like this (in
syntax):
sort cases by clientid. /* if not sorted yet.
aggregate outfile=* /presorted /break=clientid /nchild=N.
That can then be followed by statements that display statistics on the
NCHILD
variable, e.g., means or frequencies.
Other questions require other approaches. E.g., if you want to report on a
characteristic that is child-based, you need to keep it as one record per
child.
Note that multiple queries on the database can be done, there is not need
to
try and do everything all at once. And whatever SQL queries you need to do
(including complex ones that relate multiple tables) can be done from SPSS
via ODBC, the SQL syntax is embedded in the SPSS ODBC query statement.
- Moshe
>
> Is this a lack-of-training issue for the person using SPSS?
You are right. I'm using SPSS to analyze data from a relational database
with no problem to cope with the structure your are describing. AGGREGATE
function can do its best with this type of data.!
J.A.T.
jla...@alumni.brown.edu (Jonathan LaRosa) wrote in message news:<e6a1ce64.04110...@posting.google.com>...
Though agregates can solve some problems, there are other ways to do
things.
We work with survey data (questionnaires) which, if put into a
relational structure, would take up many tables. I understand your
analist's problem in that for them analysis occurs per 'case' - this is
primarily a structure problem.
For survey data (household rosters, basket of goods, etc.) flat files
work well.
Structurally, you need to know what the maximum count per case is of
repeating data (e.g. maximum 4 children per respondent). In this case,
you would make four 'child' variables in which the data for that child
is stored (e.g. child0_age, child1_age, etc.). Now you must make the
four variables into a multiple response set (e.g. mr_children_age).
This enables you to use the new variable singly and holds great
analysis benefits, especially when things start to get complex...
I used to run a lot of queries from spss, but ended up taking the time
to provide a single 'flat file' to analysts as this was easily
accessible to a wider audience.
As a database user, I would appreciate spss giving an option of storing
data directly in a database and not in a file...
Matty
Rubbish!!! MR sets have limited usefulness or applicability.
Unless you are interested in limiting yourself to XTABulations.
Neila