On Tue, Feb 22, 2011 at 14:26, Stian Håklev <sha...@gmail.com> wrote:
> We've always said that P2PU is interested in both sponsoring its own
> research projects, and support outside researchers. In Barcelona, we talked
> about creating APIs or regular datadumps. One of the things that has been
> holding this work back a bit has been the indecision about how much data we
> can release, to whom etc.
>
> I'm hereby proposing that P2PU officially decides that all user interaction
> data are completely public.
Thanks for initiating the discussion Stian. I was wondering what
decision had been made regarding this after the discussion with the
lawyers and since I'm not a lawyer, I'm not going to comment on the
legalese, rather just listen to opinions :)
Again, thanks for bringing this discussion up !
--
vid ॥ http://svaksha.com ॥
Stian:
"almost nobody else [...] share[s] material using open licenses" -- um, what?
As I tried to indicate, there's nothing particularly special or novel
about posting CC-By-SA content in a tarball. There isn't even a
question here. In my view this should just happen without discussion
or debate.
The question is about posting logs of interaction data. A lot of that
information *isn't* available by screen-scraping. Let's be clear that
that is what's at issue here. This shouldn't be a question either.
The question is what's at stake.
Joe
There is a fairly active debate over what constitutes "public" from
the perspective of human subjects boards in the US (the US tends to be
more protective in this regard than other countries I have experience
with, though our standards are becoming the norm elsewhere as well).
Just because data in a conversation occurs in public does not
automatically make it exempt, though--again--there is a lot of
discussion over this. This is the reason, e.g., the AOL search data
was fairly untouchable, despite being unquestionably public.
An IRB will look at whether users knew that the material was not only
transparent (that is, available to the public) but could envision its
use in research. Again, this isn't the case at every IRB, some (e.g.
University of California) are more restrictive than others. The OHRP
is likely to release new guidelines for online research this year that
may provide a bit more clarity in terms of expectations.
That said, if you wanted to make it easier for researchers to clear
the IRB hurdle, a signup that gave something more akin to real
informed consent would do it: i.e., explicitly saying that the work
can be used (among others) by researchers interested in online
interaction, that you can opt out at any time, etc.
Best,
Alex
One major issue that hasn't come up yet are the resources required to do this:
(1) think it through carefully and develop a strawman proposal,
(2) share with the full community for feedback, respond to comments,
update strawman, etc.
(3) put in place technology to track, analyse, share the data,
and I would like to see (4) spend time working with people who are
using the data to make sure some of the benefits of their work flows
back to improving P2PU.
In terms of priorities for paid staff - I can see a conflict for the
work of developers in (3) - and would argue that building the new site
trumps this at least for the next while.
I don't want to put a damper on this discussion - this is an important
topic - but I also want to be realistic, and make sure we understand
the trade off between this and other work.
P
When you want to build a system that emulates peer groups learning
together, try to think of what that would look like if your group was
surrounded by thousands of people with notepads watching you and
taking note of what you say. That's not peer learning anymore. It's
something different. In my perspective, it merely recreates the
problems that p2pu was trying to solve initially: that we treat
learning as access to learning objects when actually it's a lot more
about a group of people getting together to share a part of
themselves.
re. making 'public' information 'more public', Helen Nissenbaum has
some great pieces on 'contextual integrity'. She basically says that
privacy is violated when the context in which information is disclosed
is disrupted. For example, when you're sitting in a restaurant talking
to a friend, you're in a public place but you don't expect that others
will listen in. This means that making public data more public could
become a privacy violation.
Don't get me wrong: I think giving researchers access to data is
important - I just don't think it should be given away so readily.
p2pu has a trustworthy brand. But if you give everyone access to the
information that people feel like they're sharing with p2pu, there
could be some major problems (some of which you might not get pushback
on immediately since people generally don't recognise the effects of
unlimited access until something bad happens).
You could do some work anonymising the data, but I still think that
the challenge is going to be in giving people notice of this in a way
that continues to enable them to make them feel safe. Data that knows
no bounds doesn't make me feel safe.
--------------------
Heather Ford
UC Berkeley School of Information
http://blogs.ischool.berkeley.edu/masks/
I would caution against this - at least without very specific goals in
mind. What kinds of research would benefit from this kind of access?
How would it make learners feel to be told that everything they say on
a learning forum will have the ability to be parsed and tracked and
analysed.
My point is that making your data publicly available in aggregate form can have a lot of intentional and unintentional consequences.
Moral: options are indeed good.
Personally I think the question shouldn't be "should the data be
open?", but rather, "what data, precisely, should be open, and why?".
I think this is not disconnected from the *other* concerns about user
experience that Dan has highlighted lately. At least in a first pass,
the most powerful "why's" would be: because sharing X data will help Y
researcher(s) address Z issue related to user experience. Other
concerns W are potentially very interesting, but I think at least at
the moment user experience seems key.
I agree. And perhaps further: which data should be closed and why,
which data should be open and why, and which data should be more open
and why. Information that is 'public' and under a cc license but
hidden in a forum somewhere that Google doesn't prioritize is very
different from information that is in a forum and tops search results
for a particular user (either their official name or pseudonym).
I think Stian's starting point on the open end of the spectrum is right.
At the same time, I find it hard to talk about this at the theoretical
level - it would be much easier to understand the implications if we
had a few concrete cases (not just hypothetical ones, but actual
commitments from people who want to do specific analysis).
Those examples would include the information Joe is looking for. Who
wants access to what data and for what? What will be the benefit, and
to whom?
As long as we are not sure how people are going to use it - we need to
consider the potential downsides even more carefully, including:
negative perceptions of users (justified or not), the fact that we
*might* be exposing data that will be used in ways we are not
comfortable with, and the time and effort to implement and support it.
P