Storing survey(s) data?

Nelson Goforth

unread,

Jul 19, 2001, 7:29:09 PM7/19/01

to

My client conducts a number of surveys or questionnaires for THEIR
clients. There are several standard surveys, each with a different
number of questions (from 20 up to 90).

Question is, how store the data.

Seems like the simple way would be to create a table with 100 fields
and store the answers (plus metadata - survey #, timestamp, etc)
there, each record using up as many fields as necessary. In this
case I'd probably store the metadata in fields 1-10 (as needed) and
begin the real data in field 11.

Or could use one table per survey type - so that only data from the
same list of questions goes into each table.

Or could use a simpler table that stores only one answer per record
(with a couple fields for metadata and one for the answer). Frankly
dissociating the data so much scares me a bit, but this would seem to
be the logical best choice - assuming nothing to go wrong.

Any thoughts, warning, experiences on this matter. - And in a related
question, what is the best form in which to EXTRACT the data for
running into a desktop graphing program? Write to a file in a public
directory?

Thanks for any help,
Nelson
--

===========================================================
Nelson Goforth Lighting for Moving Pictures
phone: 01.303.322.5042 pager: 01.303.634.9733
resume: http://www.earthnet.net/~ngoforth/film

---------------------------------------------------------------------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail <mysql-th...@lists.mysql.com>
To unsubscribe, e-mail <mysql-unsubscribe-mysql=freebsd.csie...@lists.mysql.com>
Trouble unsubscribing? Try: http://lists.mysql.com/php/unsubscribe.php

Steve Werby

unread,

Jul 19, 2001, 9:36:12 PM7/19/01

to

"Nelson Goforth" <ngof...@earthnet.net> wrote:
> My client conducts a number of surveys or questionnaires for THEIR
> clients. There are several standard surveys, each with a different
> number of questions (from 20 up to 90).
>
> Question is, how store the data.
>
> Seems like the simple way would be to create a table with 100 fields
> and store the answers (plus metadata - survey #, timestamp, etc)
> there, each record using up as many fields as necessary. In this
> case I'd probably store the metadata in fields 1-10 (as needed) and
> begin the real data in field 11.

If you need to add/edit/delete a question you'll have to modify the table
structure and you'll likely have to modify all of your queries.

> Or could use one table per survey type - so that only data from the
> same list of questions goes into each table.

That's not really any better. Now if you add a survey you need to add a new
table and if you change a question you may have to change it in *many*
tables...that is, if you're surveys are sharing some questions.

> Or could use a simpler table that stores only one answer per record
> (with a couple fields for metadata and one for the answer). Frankly
> dissociating the data so much scares me a bit, but this would seem to
> be the logical best choice - assuming nothing to go wrong.

I recommend 3 tables. Table 1 stores the questions - question_id and
question_name. Table 2 stores the surveys - survey_id, question_id. Table
3 stores the survey results - user_id (or simple sequential id), survey_id,
question_id, response. If each question has set choices you'll need a 4th
table which will have question_id, choice_id, choice_description and instead
of response in Table 3 you'll use choice_id. Using this structure the
system will be extremely flexible and queries to report statistics will be
*much* simpler and changing the data in the application will be trivial.

> Any thoughts, warning, experiences on this matter. - And in a related
> question, what is the best form in which to EXTRACT the data for
> running into a desktop graphing program? Write to a file in a public
> directory?

Sorry, your questions are too vague for me to answer.

--
Steve Werby
President, Befriend Internet Services LLC
http://www.befriend.com/

Jeff S Wheeler

unread,

Jul 20, 2001, 1:39:42 AM7/20/01

to

In article <9j81uc$2bie$1...@FreeBSD.csie.NCTU.edu.tw>, "Steve Werby"
<steve...@befriend.com> wrote:

> "Nelson Goforth" <ngof...@earthnet.net> wrote:
>> My client conducts a number of surveys or questionnaires for THEIR

>> Seems like the simple way would be to create a table with 100 fields

>> and store the answers (plus metadata - survey #, timestamp, etc) there,
>> each record using up as many fields as necessary. In this case I'd
>> probably store the metadata in fields 1-10 (as needed) and begin the
>> real data in field 11.
>
> If you need to add/edit/delete a question you'll have to modify the
> table structure and you'll likely have to modify all of your queries.

No, not if he uses generic column names e.g. q1, q2, q3 and a helper
table to match up columns with descriptive question names. This helper
table can include a column that relates to a Survey ID (since he has
multiple surveys of differing data schemas).

>> Or could use one table per survey type - so that only data from the
>> same list of questions goes into each table.

Mrf, save me, Helper Table!

>> Or could use a simpler table that stores only one answer per record
>> (with a couple fields for metadata and one for the answer). Frankly
>> dissociating the data so much scares me a bit, but this would seem to
>> be the logical best choice - assuming nothing to go wrong.

NO, this is a bad idea for the sake of queries. You most likely want to
keep all your answers for each survey together in one record so you can
do complex queries over all responses to a given survey. Picture in your
head how each scenario would look in a spreadsheet. Think of how you
would apply different queries to the data using each of the two suggested
data organization methods.

> I recommend 3 tables. Table 1 stores the questions - question_id and
> question_name. Table 2 stores the surveys - survey_id, question_id.
> Table 3 stores the survey results - user_id (or simple sequential id),
> survey_id, question_id, response. If each question has set choices
> you'll need a 4th table which will have question_id, choice_id,
> choice_description and instead of response in Table 3 you'll use
> choice_id. Using this structure the system will be extremely flexible
> and queries to report statistics will be *much* simpler and changing the
> data in the application will be trivial.

My recommendation is similar, but keeps all the fields populated per
survey together in one record. Just use question_id as a column name
suffix (e.g. question_id=42, use column name Responses.q42). The rest of
the above suggestions are still valid for my approach.

--
Jeff S Wheeler j...@five-elements.com
Software Development Five Elements, Inc
http://www.five-elements.com/~jsw/
database sql query table

P.S.: Will my posts ever start making it to the list via the mail/news
gateway on FreeBSD.csie.NCTU.edu.tw? I am CC'ing my posts via email to
my...@lists.mysql.com now so they actually reach non-usenet readers.

Jeff S Wheeler

unread,

Jul 20, 2001, 2:40:24 AM7/20/01

to

[This is an email copy of a Usenet post to "mailing.database.mysql"]

---------------------------------------------------------------------

Nelson Goforth

unread,

Jul 20, 2001, 1:19:13 PM7/20/01

to

Thanks to Steve and Jeff for your replies.

A spreadsheet is how I've handled this for several years now.
Responses came in via a Scantron card reader (or manually) and the
resulting text file was fed into an Excel spreadsheet for graphic
output. I was in mind of the spreadsheet when devising a data
storage structure, but didn't want to be caught up in an inefficient
paradigm if there was a better way - hence I sought help from the
list.

Then I put in an interim system that allowed Web-based response,
wrote the responses to a text file (one file per questionnaire per
client) and then the text went into the same Excel spreadsheets.

The new system will store the data in MySQL table(s) for retrieval
not only of specific survey results, but (as I'm collecting industry
type and demographic data as well) in more complex ways as well. I
may actually STILL store the individual survey results in text files
for simplicity and 'bullet-proofness', but I need the added
flexibility of being able to retrieve the data in more than one way.
I AM a little uncertain of how best to get the data into a text file
to then feed into Excel (pretty reports and graphs), which was the
issue in my second question. Probably it's simply 'SELECT...INTO
OUTFILE...', but I'm trying to think of ways to avoid my client
having to ftp.

The actual questions ARE stored in a separate table (table 1 in
Jeff's example), with one questionnaire (regardless of number of
questions) per record. The questions are simply in one text field
and my Perl program parses them out into a table. All questionnaires
in this group are scored on a 1-5 scale - so I just use radio buttons
in an HTML form. Works great. Storing them this way also allows me
the potential for storing each questionnaire in multiple language
versions (one field in each record holds a language code, like 'en'
or 'es', and another holds the CHARSET code). I've also got a table
(Jeff's Table 2) that holds the information about the survey as a
whole (client name, industry type, etc). Another table holds
demographic data.

Since I'm wanting anonymity for the survey-takers, and since I want
to make sure that no one is stuffing the ballot box (which was
possible up to now), I also have the administrators create a set of
'tickets' (another table) for the survey. They enter the e-mail
addresses for the people they want to answer the survey, then the
system creates a ticket number (an MD5 digest involving a random
number) and that ticket number, attached to a URL, is mailed to the
survey taker. Once they use that ticket to take the survey the
ticket is 'punched' and can't be used again. The client is not
allowed to see the relationship between ticket number and e-mail
address and therefore anonymity is increased. I suppose I could
disassociate them even further with another step and remove the
association of the ticket number with the data, but the anonymity
isn't THAT critical and the ticket number is how I'll associate the
questionnaire responses with the demographic data.

So I'll go with keeping all the data together, and hope they don't
come up with any '100 questions' questionnaires. 90 is too many
anyway - takes hours!

Thanks again,
Nelson

> >> Seems like the simple way would be to create a table with 100 fields
>>> and store the answers (plus metadata - survey #, timestamp, etc) there,
>>> each record using up as many fields as necessary. In this case I'd
>>> probably store the metadata in fields 1-10 (as needed) and begin the
>>> real data in field 11.
>>
>> If you need to add/edit/delete a question you'll have to modify the
>> table structure and you'll likely have to modify all of your queries.
>
>No, not if he uses generic column names e.g. q1, q2, q3 and a helper
>table to match up columns with descriptive question names. This helper
>table can include a column that relates to a Survey ID (since he has
>multiple surveys of differing data schemas).
>
>>> Or could use one table per survey type - so that only data from the
>>> same list of questions goes into each table.
>
>Mrf, save me, Helper Table!
>

<---snip---->

>NO, this is a bad idea for the sake of queries. You most likely want to
>keep all your answers for each survey together in one record so you can
>do complex queries over all responses to a given survey. Picture in your
>head how each scenario would look in a spreadsheet. Think of how you
>would apply different queries to the data using each of the two suggested
>data organization methods.
>
> > I recommend 3 tables. Table 1 stores the questions - question_id and
>> question_name. Table 2 stores the surveys - survey_id, question_id.
>> Table 3 stores the survey results - user_id (or simple sequential id),
>> survey_id, question_id, response. If each question has set choices
>> you'll need a 4th table which will have question_id, choice_id,
>> choice_description and instead of response in Table 3 you'll use
>> choice_id. Using this structure the system will be extremely flexible
>> and queries to report statistics will be *much* simpler and changing the
>> data in the application will be trivial.
>
>My recommendation is similar, but keeps all the fields populated per
>survey together in one record. Just use question_id as a column name
>suffix (e.g. question_id=42, use column name Responses.q42). The rest of
>the above suggestions are still valid for my approach.

--

===========================================================

Nelson Goforth Lighting for Moving Pictures
phone: 01.303.322.5042 pager: 01.303.634.9733
resume: http://www.earthnet.net/~ngoforth/film

---------------------------------------------------------------------

Don Read

unread,

Jul 20, 2001, 3:30:25 PM7/20/01

to

On 20-Jul-2001 Nelson Goforth wrote:
> Thanks to Steve and Jeff for your replies.
>

<snip>

>
> The new system will store the data in MySQL table(s) for retrieval
> not only of specific survey results, but (as I'm collecting industry
> type and demographic data as well) in more complex ways as well. I
> may actually STILL store the individual survey results in text files
> for simplicity and 'bullet-proofness', but I need the added
> flexibility of being able to retrieve the data in more than one way.
> I AM a little uncertain of how best to get the data into a text file
> to then feed into Excel (pretty reports and graphs), which was the
> issue in my second question. Probably it's simply 'SELECT...INTO
> OUTFILE...', but I'm trying to think of ways to avoid my client
> having to ftp.
>

--- directly into Excel ---
print 'Content-type: application/ms-excel', "\n";
print 'Content-Disposition: inline filename="foo.csv"', "\n\n";

-- or save as file ---
print 'Content-type: octet/stream', "\n";
print 'Content-Disposition: attachment; filename="foo.csv"', "\n\n";

$res=SQLQuery("SELECT ...");

while ( @row = $res->fetchrow()) {
print join(',', @row), "\r\n";
}

Regards,
--
Don Read dr...@texas.net
-- It's always darkest before the dawn. So if you are going to
steal the neighbor's newspaper, that's the time to do it.
(53kr33t w0rdz: sql table query)