I think your DHT approach is still viable but you have to deal with
scalability much more intelligently.
You'll likely always have a scalability problem due to your overly
eager client. Since the client attempts to upload its location data
info on each reboot (per the suggested crontab entry) and since this
"write" operation is an expensive operation for the servers to handle,
it is not going to scale. You need to be able to throttle your
clients and minimize their upload sizes.
It is likely that over 90% of the uploads are unnecessary (i.e. no
stolen laptop). You should attempt to throttle your clients upload
rates. You can probably achieve this in the very short term by
seeding your gateway file updates with many bogus host names (i.e.
bogusplanet1.a.com). Given your adeona.resources file with 50 entries
and your retry counts at around 10 (i.e. #define RS_MAX_TRYSERVERCACHE
10), you should likely need to add upwards of 10x bogus entries to
keep clients from performing updates so often. You should be able to
perform this bad seeding on the backend gateway update server (i.e.
GWUPDATE_URL =
adeona.cs.washington.edu/gateways.adeona).
Ideally, you should be having the client throttle itself. This would
require client software updates even though it doesn't sound like you
have any type of automatic software update notification mechanism.
The client should be doing something like recording its last update
timestamp and comparing it against its current time. If it is older
than say 20 hours, then pick a random time of the next 4 hours (or
some assumed average uptime for laptops). If it is less than 20 hours
since the last update, don't bother with an update. Ideally, you can
have the backend servers respond with a preferred backoff time for
clients after an upload in case 20 hours is not sufficient to slow
things down. You need to strike the balance between quick recovery
times of laptops versus scalability of the architecture when choosing
these times.
If you don't pick a random time, then you'll get a peak load where
everyone is turning on their laptops at the same time (such as in the
morning for a given timezone) and pounding your backend servers with
updates. You need to distribute and smooth your load over time. You
need to prevent and flatten out any type of correlated event such as
due to diurnal/daily peak usage patterns.
Also, why are you bothering to upload the location information to the
DHT? It is a very expensive disk upload and write operation that just
doesn't seem necessary and will always kill your scalability. I would
recommend just use the DHT as a rendezvous point (as it was originally
designed). If my laptop is stolen, I'll leave my home computer on for
24 hours waiting until the laptop is turned on and updates the DHT
with its current key. The <10% of the laptops stolen having their DHT
key/value (i.e. value of laptop' current encrypted IP address) be
polled for an update probably causes much less load on the servers
than trying to upload the location data for the 90% who are not
stolen. The home computer would then just upload the full location
info from the laptop directly using the DHT as a DHT tracker instead
of as a distributed storage network. It would even allow the home
computer to issue a kill command to the laptop, if desired.
Granted these are some usual ideas and you may not agree with most of
them, but I don't think you should abandon your original DHT concept.
If you try to bring in the corporate world to help, they will just
have a knee jerk reaction to centralize things defeating your original
goals. If you try to swap out the DHT for some other type of
distributed database, you will still have to deal with flattening out
these peak loads in order to ultimately scale.