BCI Data Worries

7 views
Skip to first unread message

Mike Giddens

unread,
Sep 15, 2008, 11:22:07 PM9/15/08
to Biodiversity Collections Index
Roger,

I was having a talk with someone that was new to BCI and we where
talking about how to change data. I told her to setup an account and
log in then you can edit your data. Her first question was well
anyone can do this so what would happen if someone setup a fake
account and changed a lot of data to spam or something bad?

I told her I was not sure how this was handled but would find out.

So my questions are:

What happens if someone goes in a changes a lot of data to false data?

Is there a admin rollback or erase all user_x changes?

Have you experience any fake accounts yet or hacking attempts that
might bring the site down?

I know it is always scary when you need to let the community edit
changes and how to protect against that.

-Mike

rogerhyam

unread,
Sep 16, 2008, 11:20:48 AM9/16/08
to Biodiversity Collections Index
Hi Mike,

On Sep 16, 4:22 am, Mike Giddens <mikegidd...@silverbiology.com>
wrote:

> What happens if someone goes in a changes a lot of data to false data?

Currently all you need to register with BCI is a valid email address.
It is feasible that bad people could log in and start vandalizing the
site. This danger is mitigated by a number of factors.
* I watch the modifications going passed and I would step in and
disable an account that went feral.
* The system is bespoke and so difficult for a robot to discover
and exploit as happens with common blogging systems (even this Google
group)
* No HTML is permitted in fields and no images which makes it less
attractive.
* Most importantly there is non-editable authoritative data on the
site that can act as a backup to rubbish being posted.

It is a common model on the web to have open registration systems and
I would be loathed to lock it down at all but if we had trouble I
would have to do that. You have to apply for membership of MorphBank
for example (http://www.morphbank.net/Admin/userapplication.php)
though I am sure they will let anyone who seems reasonable join.

I like to keep barriers low but I would welcome other people's
thoughts on this matter.

> Is there a admin rollback or erase all user_x changes?

All changes to collection records are logged (a trigger copies the db
row before it is updated). Currently there is no interface to rolling
back changes. It would require me (or some sysadmin) to run a series
of SQL commands to roll back a particular user's changes - but it
could/would be done.

There is a feature in the 'future release' category for a roll back as
part of the editing process so that anyone could see previous versions
of a record and opt to revert to them. Again I would welcome people's
thought on how important this is to implement.

> Have you experience any fake accounts yet or hacking attempts that
> might bring the site down?

No hacking yet (touch wood) but we haven't widely promoted the site.
(As an aside the simple form allowing people to register interest in
the site that we had up on the same domain prior to launch was getting
spam posted through it quite regularly but that was a plain html form
that was easier to automate not a bespoke AJAX based interface.) I am
sure it will happen at some point but we are a pretty uncontroversial
low profile site so not that attractive.

> I know it is always scary when you need to let the community edit
> changes and how to protect against that.

I believe BCI is one of a very few sites in the taxonomy community
that allows this open model. I started a thread on TAXACOM a while
back to discuss this (it has been discussed many times before).
Taxonomists jealously guard their data but the result is that large
amounts of content are being generated on more open systems like
Wikipedia than not-yet-open or vetted systems like EoL. The thread is
archived here if you are interested in reading it:

http://mailman.nhm.ku.edu/pipermail/taxacom/2008-July/027422.html

Any thoughts on the balance between inclusiveness and keeping the
rabble out (I'm reminded of the Groucho Marx quote) would be welcome.

Thanks for your support,

Roger

Mike Giddens

unread,
Sep 19, 2008, 10:11:09 AM9/19/08
to Biodiversity Collections Index
One thing I would suggest to help keep a major accident from happening
would be to add a flooder.

http://www.phpclasses.org/browse/file/9165.html

This is an example that has a thresh hold. I would not put it on the
api services but defiantly on your saving tasks. That way if someone
wanted to flood your data with spam it could at least shut that ip
down. In most cases a real person would not do that many different
change actions so you could put this threashold tight to keep hackers
out. I know I have even tweeked this class to blacklist and email me
if something strange is going on.

-Mike

rogerhyam

unread,
Sep 22, 2008, 6:32:44 AM9/22/08
to Biodiversity Collections Index
Thanks Mike,

I'll have a look at throttling flooders as we move forward. Been
thinking about allowing update through the JSON service. Would require
authentication and the passing backwards and forwards of some kind of
key - which would be a good place to throttle people :)

At the moment I am thinking session management would be handled
explicitly rather than using browser sessions. This would be so people
could use the service in non-browser clients. On the other hand I
could just expect non-browser clients to look to the cookie headers to
do their stuff....hmm

Do you have thoughts on this?

Roger

On Sep 19, 3:11 pm, Mike Giddens <mikegidd...@silverbiology.com>
wrote:

Mike Giddens

unread,
Sep 22, 2008, 8:38:42 AM9/22/08
to Biodiversity Collections Index
Thought on a loose open system for write permissions leads me to an
api key design. In google's case they use a referrer url to create a
one way key to grant authentication/access to their services. This is
the loose case saying here is my key and it matches the url/domain
that I am requesting information from. This keeps others from using
your key for access as long as they do not reside on that domain.

In your case you might want to use the username/password to login and
from there go to a generate key that way you register a api key to a
user account so you know every time you log this key's request you can
later match it up to the user and you know they are coming from a
certain domain and this was the original user that created the account
if I ever need to contact them. Also it lets you track how much the
service is being used and by whom. Also good for the people funding
the project.

I have a simple php key generator I will send your way to show you a
real world example. I am on the road for a few more days so when I
get back to the office I will email you.

Here are some links to docs on the concept:
https://ajax.dev.java.net/ajax/api-keys
http://www.streetdirectory.com/travel_guide/6492/online_business/how_to_use_your_google_api_key_as_your_secret_weapon_part_one.html
There is a good one but I can not find it right now.

-Mike
Reply all
Reply to author
Forward
0 new messages