suggestions

2 views
Skip to first unread message

sg

unread,
May 26, 2009, 12:52:25 AM5/26/09
to adeona-developers
I think your DHT approach is still viable but you have to deal with
scalability much more intelligently.

You'll likely always have a scalability problem due to your overly
eager client. Since the client attempts to upload its location data
info on each reboot (per the suggested crontab entry) and since this
"write" operation is an expensive operation for the servers to handle,
it is not going to scale. You need to be able to throttle your
clients and minimize their upload sizes.

It is likely that over 90% of the uploads are unnecessary (i.e. no
stolen laptop). You should attempt to throttle your clients upload
rates. You can probably achieve this in the very short term by
seeding your gateway file updates with many bogus host names (i.e.
bogusplanet1.a.com). Given your adeona.resources file with 50 entries
and your retry counts at around 10 (i.e. #define RS_MAX_TRYSERVERCACHE
10), you should likely need to add upwards of 10x bogus entries to
keep clients from performing updates so often. You should be able to
perform this bad seeding on the backend gateway update server (i.e.
GWUPDATE_URL = adeona.cs.washington.edu/gateways.adeona).

Ideally, you should be having the client throttle itself. This would
require client software updates even though it doesn't sound like you
have any type of automatic software update notification mechanism.
The client should be doing something like recording its last update
timestamp and comparing it against its current time. If it is older
than say 20 hours, then pick a random time of the next 4 hours (or
some assumed average uptime for laptops). If it is less than 20 hours
since the last update, don't bother with an update. Ideally, you can
have the backend servers respond with a preferred backoff time for
clients after an upload in case 20 hours is not sufficient to slow
things down. You need to strike the balance between quick recovery
times of laptops versus scalability of the architecture when choosing
these times.

If you don't pick a random time, then you'll get a peak load where
everyone is turning on their laptops at the same time (such as in the
morning for a given timezone) and pounding your backend servers with
updates. You need to distribute and smooth your load over time. You
need to prevent and flatten out any type of correlated event such as
due to diurnal/daily peak usage patterns.

Also, why are you bothering to upload the location information to the
DHT? It is a very expensive disk upload and write operation that just
doesn't seem necessary and will always kill your scalability. I would
recommend just use the DHT as a rendezvous point (as it was originally
designed). If my laptop is stolen, I'll leave my home computer on for
24 hours waiting until the laptop is turned on and updates the DHT
with its current key. The <10% of the laptops stolen having their DHT
key/value (i.e. value of laptop' current encrypted IP address) be
polled for an update probably causes much less load on the servers
than trying to upload the location data for the 90% who are not
stolen. The home computer would then just upload the full location
info from the laptop directly using the DHT as a DHT tracker instead
of as a distributed storage network. It would even allow the home
computer to issue a kill command to the laptop, if desired.

Granted these are some usual ideas and you may not agree with most of
them, but I don't think you should abandon your original DHT concept.
If you try to bring in the corporate world to help, they will just
have a knee jerk reaction to centralize things defeating your original
goals. If you try to swap out the DHT for some other type of
distributed database, you will still have to deal with flattening out
these peak loads in order to ultimately scale.

Phil

unread,
May 26, 2009, 2:52:59 PM5/26/09
to adeona-developers
I disagree with a few key points here.

First off, not sending the data on every reboot is a dangerous thing.
Often times thieves will boot once to see what's on a laptop and then
shut it down and do what they are going to do with it (pawn it off,
wipe it, etc.) Posting to the backend on boot is very important. This
is also why location information is important as well. You should be
able to locate the laptop when it is turned off.

As far as peak loads and whatnot, I do understand this problem and it
is going to be something we have to deal with. However, from what I
understand, this is not our current problem as much as the reliability
and maintenance of the OpenDHT backend. Also, the nice thing about
internet peak times is that they are spread across time zones... the
9:00AM bell for your average 9-5 hits all day for different parts of
the world.

I am not familiar with any bandwidth reducing techniques that Adeona
currently uses, but I do agree that only relevant (changed)
information is necessary to upload. in other words, if there's 5
updates at a single location, only the first one would carry the
location information.

Update notifications are probably a good thing as well as a news feed
so that important messages like our current status of OpenDHT can be
relayed. Our first priority right now, though, should be getting to a
functional backend.

Zeropoint

unread,
May 28, 2009, 6:24:41 AM5/28/09
to adeona-developers
OK I have a Idea, we need to keep traffic down and we need a better
way of managing the backend database. What if we had in the database a
distress table that listed lost computers. When your computer or
laptop goes online it queries the distress table and if it is marked
as stolen it updates every time it connects otherwise the update is
only pushed every second logon this would halve the traffic. The
recovery utility can push and remove the data into the Distress
database and control the client operation. The distress table would
usually be small and could be kept on a single server. We could push
forward with a thin API to abstract the database. This gives us 2
advantages the thin API could be on any machine and the Database could
be any type you please. This also lends itself to inhouse solutions
for tracking stolen computer equipment as well.

Zeropoint

Yehuda Katz

unread,
May 30, 2009, 5:54:48 PM5/30/09
to adeona-d...@googlegroups.com
This assumes that the laptop is reported stolen before the first time it is turned on.
Otherwise, you have the same issue that was mentioned in the OP.
Reply all
Reply to author
Forward
0 new messages