Hello, and thankyou. This question is about data collection for a real
application that my hobbyist interest into crypto has left me without a
clue on how to proceed � except that I need help and should ask here.
I am trying to collect data about treatments for an illness that still
carries a high level of social stigma. Specifically I am trying to
collect information from sufferers on what they have tried and what has
worked or failed. Such data should exist but either it doesn�t or is
heavily contaminated e.g. shysters who want to exploit victims claim
unbelievable levels of success and even genuine practitioners results
are dubious � when treatment fails many patients leave the programs and
are then labelled as �uncooperative� and their lack of positive result
is excluded from any published result.
My final objective is to come up with a list of information from
sufferers that rates treatments that have enough submissions to be
significant and that have an affect strong enough not to be regarded as
simply a placebo effect.
This is what I think I need:
1. An anonymous way (via the internet � either web site or e-mail or a
combination) for a sufferer to submit information.
2. But at the same time a way that the submission/data can be checked
for multiple submissions from shysters out to make a buck and even
multiple submissions from people who are �converts� to particular
treatments who do not realise how their multiple submissions will
damage/degrade collected data and results.
Any suggestions on where to start?
Thanks and merry Christmas.
If there's no difference in potential submitters' knowledge before they
submit, there's no way for you to decide if two submissions are from
distinct persons, or from one person that simply forgot everything after
the first submission.
Therefore, unless you have some infrastructure, it seems difficult to
get anywhere using cryptography.
Alternative, heuristic approaches such as filtering on ip-addresses may
give useful results.
- kg
> I am trying to collect data about treatments for an illness that still
> carries a high level of social stigma.[snip]
I hate to discourage you, but your expectations seem too ambitious to
be realistic in view of the chaotic nature of the internet with quite
some malicious people always wanting to hinder/destroy anything that
is intended to be good. On the other hand, there are medical students
whose theses need statistical data from patients having received
treatments from clinics. Their questionnaires (annonymous, with a
serial identification number only) are officially sent by the clinics
so that the patients are likely to have the trust and goodwill to
respond. To my knowledge even such collections of data are not easy to
succeed in practice.
M. K. Shen
This sounds more like a survey methodology issue than a crypto issue.
I'd be quite surprised if there were any simple and effective crypto
solutions to your problems. (Of course, simple solutions are easy to
come by; it's the "effective" part that's difficult.)
In particular, there's a fundamental conflict between perfect
anonymity and preventing of "ballot stuffing": if there's no way you
can tell who I am, how can you tell I'm not the same person as the
last guy who answered your survey?
Conventional elections solve the problem by requiring voters to
identify themselves but shuffling the ballots before they're counted.
This is not the only possible solution: for example, you could instead
gather all the participants together before the survey and have them
each draw a unique ID token from a well mixed bowl. But in either
case, you need some way to convince the participants that the
shuffling really is random. The less they trust you and the more they
want to avoid having their answers revealed, the more convincing
you'll need to be.
(Note that, even if you had some cryptographic protocol to do the
shuffling using multiparty computation or something, you'd still have
to convince the participants that your protocol really worked the way
you say. Which is likely to be difficult if the math involved goes
over most participants' heads.)
Anyway, to a large extent it comes down to a matter if trust. If the
participants trust you enough, it's enough for you to tell them that
you'll treat their answers confidentially. After that, it's just a
matter of taking the appropriate steps to avoid accidental disclosure.
Conversely, if they don't trust you, no amount of crypto tricks are
likely to convince them that you won't be able to identify them -- if
anything, setting up complex procedures is likely to just make people
more suspicious.
That said, there are some methods that people have developed for
collecting statistical data on sensitive topics, such as "randomized
response" surveys. ("Throw two coins. If the first coin comes up
heads, answer A for yes and B for no; if the it comes up tails, look
at the second coin and answer A for heads and B for tails.")
There are two problems with such schemes, though: first, while they do
reduce the amount of information participants have to reveal, they
don't, and can't, eliminate the risk of disclosure entirely. And
second, even if there was no risk, people still wouldn't trust them
completely.
(I vaguely recall a study where an anonymous randomized response
survey was sent to a bunch of people known to have been convicted of
some crime, asking them, among other things, whether they'd ever been
convicted of that crime. The number of positive replies received was
significantly lower than it should've been, if everyone had followed
the instructions honestly.)
--
Ilmari Karonen
To reply by e-mail, please replace ".invalid" with ".net" in address.
I was hoping there was something useful with anon. voting protocols -
which I know nothing about - but no joy. Thanks KG - a straight answer
is worth much.
Merry Christmas