I've dived deeper into osw and your GSoC procect ideas[1]. I'd like to work on
the "User search and discovery across networks" idea. I've written a social
media monitoring crawler in my last job and used lucene[2] for that.
Given that osw server provide two informations, the implementation could be
rather easy:
- osw servers provide a list of all accounts registered at this server
- osw servers provide a list of all other osw servers they know
I presume that there is some way to fetch the public vcard information of an
account.
So my project would be to crawl all osw servers, fetch the account lists,
fetch the available public vcard information for each account, push that into
lucene and write an integration into the GWT web app.
The search engine I propose would not be decentralized. It would collect all
available public account information in one central database and make it
searchable from there. However every osw installation would be free to provide
it's own search server.
It would not provide a good user experience to have a decentralized search
solution. Such a solution would need to contact maybe hundreds or even
thousands of osw servers and aggregate the search results. This however leads
to high latency and would suffer partial non-availability since there will be
some server down or not available at any given time.
Having a searchable database of even billions of osw users is doable since
every profile contains only a few bytes. For some millions of users even plain
lucene is good enough.
What do you think?
[1] https://github.com/onesocialweb/osw-openfire-plugin/wiki/GSoC-Ideas-Page
[2] http://lucene.apache.org/
Best regards,
Thomas Koch, http://www.koch.ro
I'm not affiliated with OSW, but just wanted to let you know that I'm
working on a similar project. I'm using Python, however.
My project has two differences to what you describe: I use
synchronization between search servers, so each server has the same
database.
Also, if I get you right, you want to maintain a list of OSW servers
that should be crawled. This will make things difficult for users who
want to run instances of OSW/Diaspora/Friendika just for themselves. I
suggest that OSW servers simply submit their VCard addresses to any
search server, which will fetch the VCards. Synchronization makes sure
that every search server gets these VCards.
Here is my repository:
http://github.com/Leberwurscht/Diaspora-User-Directory
And here some discussion on the diaspora-dev mailing list:
http://groups.google.com/group/diaspora-dev/browse_thread/thread/b7e168187160f2b4
Note that I'm currently focussing on synchronization and spam
prevention. I have not yet concentrated on efficient searching in the
database, which is still SQLite.
Maxi
there is no public search engine of email adresses, but people still manage to
share their email adresses offline and use email for communication. So I think
that for those people that don't even want to make only their name public, osw
could still work.
However with Email there are two workarounds:
a) When I want to contact somebody, I do a web search for his name and maybe
affiliation (like "Debian") and I'll probably find an Email he sent to a
public mailing list and grab his Email adress from the search result.
b) Many contact (like the contact with you) got established by joining a
mailing list (channel in buddycloud speech) and receiving a response on a
message.
I propose to encourage people to reveal a small part of their profile as
public data: name, country, account name, maybe small photo(?). The great
majority of social network users reveals even more data today to public search
engines. I don't see, how the information of the pure fact of my existence
could do any harm to me.
Information which is far more sensible and which should not be publicly
exposed is my social graph, contact data, interests, channel subscriptions,
CV, affiliations, ...
As an optional addition to the public search engine we could later on provide
a (slower) distributed search: Ask all my contacts, whether they know a person
fitting to my search request. So my profile would have an option: "reveal my
profile for 2nd grade searches".
Still people have of course the option to not publish any information about
themselves.
@Diana: Any response, whether XSF would accept you as a mentor for a GSoC
project?