http://babyl.dyndns.org/techblog/entry/cpanvote-a-perl-mini-project
This seems very much in line with our conversations of how we wanted to proceed with the API. Let me just share some of my thoughts. We can hash this out a bit and then map it out in the Wiki.
What we had thought of was add an authentication layer to the API -- something outside of ElasticSearch, I would think. Probably some 3rd party auth with tokens (like Twitter, Google, etc). How I *think* it might work is:
* User comes to something.metacpan.org
* User wants to create account
* User authenticates via Twitter/3rd party
* Account is created and user is logged in
Once the user is logged in, we can allow her to begin adding metadata like favourite modules, dists, authors etc. User can also up and downvote modules and whatever else someone may dream up (follow authors etc). If this user is a CPAN author, an authentication email can be sent to the author's CPAN email. If the link in the email is clicked etc this user now has an author "role", which allows the user to edit her own author metadata (which we're currently doing via the json config file). All of this stuff would (I think) make sense in the ElasticSearch index, so that other apps can query and get data on most/least favourite modules, authors, dists etc. We don't keep any private data on file. Anything which shouldn't be public, doesn't get housed on our end, with the exception, perhaps, of the auth tokens we'll need, which would reside outside of the index.
Once this system is in place, the authentication system would also allow other apps to let users log in, so that you could have a list of your favourite modules in your mobile app, command line client or any other search site you may wish to use, like Marks JS site or something built server-side etc.
So, that's a high level overview of how something like this *might* work. The cool thing about this layer is that (I think) this is what makes the index interesting. What we have now is helpful, but not necessarily interesting, since it's really just a coming together of data from different sources. This sort of human feedback would make the index much more valuable and useful.
So, I'm just going to put this out there to get the discussion started. Feel free to tell me that I'm going about it the wrong way. :)
Olaf
*de-lurk and wavies around* Hi, hi! :-)
> What we had thought of was add an authentication layer to the API -- something outside of ElasticSearch, I would think. Probably some 3rd party auth with tokens (like Twitter, Google, etc). How I *think* it might work is:
>
> * User comes to something.metacpan.org
> * User wants to create account
> * User authenticates via Twitter/3rd party
> * Account is created and user is logged in
>
> [..] We don't keep any private data on file. Anything which shouldn't be public, doesn't get housed on our end, with the exception, perhaps, of the auth tokens we'll need, which would reside outside of the index.
That's all sound pretty good and reasonable, and with OAuth
authentication and the like, it should be fairly easy to do.
> So, I'm just going to put this out there to get the discussion started. Feel free to tell me that I'm going about it the wrong way. :)
>
So far, everything sounds just dandy. Of course, as usual, the
devil will be in the details. I'll begin to refresh my memory on the
different authentication services to see how brimstonish that particular
devil will be. :-)
Joy,
`/anick
On 2010-12-17, at 4:34 PM, Yanick Champoux wrote:
> So far, everything sounds just dandy. Of course, as usual, the devil will be in the details. I'll begin to refresh my memory on the different authentication services to see how brimstonish that particular devil will be. :-)
That sounds excellent. The authentication will come in handy when Mark and I get back to the iPhone and iPad development. Right now we're storing user data on the phone, but really it could sync with the API, which would make a lot of sense. Let us know what you find.
On a related note, we were talking about this yesterday and realized with some basic data like "favourite modules", it would be trivial to put together a jobs app which would employers could use to find Perl developers based on the modules the devs like vs the modules required for a particular job. Anyway, that's just one application of the data in the API that I think would be useful.
Olaf
Very interesting alternate use of the data; I hadn't thought of
that one. :-)
Joy,
`/anick
[Twitter OAuth] Let us know what you find.
Both the Dancer and Catalyst solutions look to be fairly simple to implement. Dancer might be a nice way to go as I'm not sure this really requires all of the options Catalyst brings with it. It would be a nice, simple implementation.
As far as using Twitter for auth goes, I think that works well. We could look at a solution that has more options for services, but I'm not sure if we need that. My feeling on this is that we just need a decent authentication mechanism to start with. If anybody feels strongly that we need Twitter + some other service(s), they can certainly provide a patch. :) At the very least, if you already have a Twitter account, you don't need to create a new one. If you do need to create a Twitter account, it's not a lot of overhead. Basically, the easiest, fastest solution sounds best to me as long as it's easy to build on in future.
Does anyone have any objections?
Yanick, as far as bringing this live goes, let me know if you need to dev this on the cloud instance or if you want to get it going locally first.
Olaf
And so far, it is. I'm going back to work tomorrow (boo! hiss!),
which should slow me down a wee bit. But I should be able to share
something before long.
> As far as using Twitter for auth goes, I think that works well. We could look at a solution that has more options for services, but I'm not sure if we need that. My feeling on this is that we just need a decent authentication mechanism to start with. If anybody feels strongly that we need Twitter + some other service(s), they can certainly provide a patch. :)
Heh. Yup. :-) The key is to have one authentication that works to
get us started, and to make sure that we architect things such that we
can add other ones later on if we want/need to.Does anyone have any
objections?
> Yanick, as far as bringing this live goes, let me know if you need to dev this on the cloud instance or if you want to get it going locally first.
I'll work locally first. Once I have something relatively decent,
we can then see how we can send it sitting in the cloud.
Joy (and Happy New Year!),
`/anick
> On 11-01-03 12:12 PM, Olaf Alders wrote:
>> Both the Dancer and Catalyst solutions look to be fairly simple to implement. Dancer might be a nice way to go as I'm not sure this really requires all of the options Catalyst brings with it. It would be a nice, simple implementation.
>
> And so far, it is. I'm going back to work tomorrow (boo! hiss!), which should slow me down a wee bit. But I should be able to share something before long.
Consider yourself lucky. I've been back since Monday. ;)
>> As far as using Twitter for auth goes, I think that works well. We could look at a solution that has more options for services, but I'm not sure if we need that. My feeling on this is that we just need a decent authentication mechanism to start with. If anybody feels strongly that we need Twitter + some other service(s), they can certainly provide a patch. :)
>
> Heh. Yup. :-) The key is to have one authentication that works to get us started, and to make sure that we architect things such that we can add other ones later on if we want/need to.Does anyone have any objections?
I would call that deafening silence a nod of approval. The basic philosophy is that you go ahead and do what you think is best. If it's a bad idea, someone will let you know. That's also why we have version control. It's a good idea to bounce stuff off the list, but if there's no immediate response, it's probably fine with everyone.
>> Yanick, as far as bringing this live goes, let me know if you need to dev this on the cloud instance or if you want to get it going locally first.
>
> I'll work locally first. Once I have something relatively decent, we can then see how we can send it sitting in the cloud.
Looking forward to seeing it!
>
> Joy (and Happy New Year!),
> `/anick
Happy New Year!
Olaf
> Does anyone use reddit at all? Does it's upvote/downvote system have any appeal for this project?
I don't use Reddit, but I *think* that's the general idea Yanick has in mind. But I think there are several of layers here. The first is the authentication later, which Yanick is building. The second is the format we choose to store the data in the index (no decisions yet) and the third would be the actual UI, which I don't think we've really touched on. I think the Reddit UI is a good example of how to do it well. I know Mark is planning on implementing up/downvoting in search.metacpan.org once the authentication is in place. Not sure what he had in mind, though. Are you volunteering for something? :)
BTW, let me know if you need any input on how to integrate your cpanratings work with the indexing script.
Olaf
And so far, it is. I'm going back to work tomorrow (boo! hiss!), which should slow me down a wee bit. But I should be able to share something before long.Consider yourself lucky. I've been back since Monday. ;)
I would call that deafening silence a nod of approval. The basic philosophy is that you go ahead and do what you think is best. If it's a bad idea, someone will let you know. That's also why we have version control. It's a good idea to bounce stuff off the list, but if there's no immediate response, it's probably fine with everyone.
   {"my_vote":"yea","meh":1,"yea":1,"nea":0,"total":2}
I don't use Reddit, but I *think* that's the general idea Yanick has in mind.
But I think there are several of layers here. The first is the authentication later, which Yanick is building. The second is the format we choose to store the data in the index (no decisions yet) and the third would be the actual UI, which I don't think we've really touched on.
I think the Reddit UI is a good example of how to do it well. I know Mark is planning on implementing up/downvoting in search.metacpan.org once the authentication is in place.
I'm pretty sure that reddit's core code is not available. But, as I
see it, the main up/downvoting functionality is pretty simple:
* you have things.
* you have peeps.
* you have make relationships between peeps and things with a score (+1/-1).
Tada! All done! ;-)
Of course, the hard part is to craft your system such that it
scales to the number of peeps/things you have in mind.
Joy,
`/anick
> On 11-01-04 09:53 PM, Olaf Alders wrote:
>> And so far, it is. I'm going back to work tomorrow (boo! hiss!), which should slow me down a wee bit. But I should be able to share something before long.
>>
>> Consider yourself lucky. I've been back since Monday. ;)
>>
>
> Oh, I know I'm complaining with my belly full, so to speak. I had two full vacation weeks before and during the Holidays. Blissful, blissful time filled with eggnog, fun German retro-cop tv series (don't ask) and happy hacking. :-)
Was it Tatort? That's my favourite German cop show.
>
>
>> I would call that deafening silence a nod of approval. The basic philosophy is that you go ahead and do what you think is best. If it's a bad idea, someone will let you know. That's also why we have version control. It's a good idea to bounce stuff off the list, but if there's no immediate response, it's probably fine with everyone.
>>
>
> Excellent, that's pretty much how I usually do my stuff. I always find it a little easier to show a prototype, so that we can have a kinda-solid base on which to argue.
Perfect!
>
> Talking of which, I *do* have a prototype! Code is at http://github.com/yanick/cpanvote (the subdirectories 'schema' and 'dancer' are the interesting ones), and the application itself is at http://cpanvote.babyl.ca/sample The application is extremely, very, unapologetically basic, and is there only to illustrate how the service can be queried via ajax calls, so don't be too put off by it. :-)
>
> In all cases, here's a quick walk-through of what the demo does:
>
> * On the '/sample' page, there are 4 random modules. Their vote tallies are retrieved via ajax calls to 'http://cpanvote.babyl.ca/dist/<distname>/votes', which return json documents that look like:
>
> {"my_vote":"yea","meh":1,"yea":1,"nea":0,"total":2}
>
> The 'my_vote' element only appears if you are logged in, and if you already voted on that module.
>
> * We know if we are logged in via a call to 'http://cpanvote.babyl.ca/authenticated' (which, if we are logged in, returns the current username). If we are not logged in, we can do it via Twitter. The authentication process currently goes this way:
>
> 1. visit the Twitter Oauth authentication page. If we click "yeah, yeah, give permissions to that app", it ...
>
> 2. ... redirects us to the '/auth/twitter/callback' local page, which takes care of the app-side of the authentication
> handshake. If that succeed, we are redirected to ...
>
> 3. ... the local '/welcome' page, which takes the twitter credential and link it to an internal user (which will be useful later on for configuration elements and for allowing users to authenticate with different systems).
>
> 3a. For now, the session itself is kept in a cookie, so we are keeping truly a minimum of information in the database. The tables for the internal user accounts and the credentials look like:
>
> CREATE TABLE Users (
> id INTEGER PRIMARY KEY NOT NULL,
> username varchar(20) NOT NULL
> );
>
> CREATE TABLE Auth (
> user_id integer,
> protocol varchar(10) NOT NULL,
> credential varchar(20) NOT NULL,
> PRIMARY KEY (protocol, credential)
> );
>
> As you can see, no password is kept on the server side (the "credential" for twitter is the twitter username). We might want to add session management on the server side for security reasons (right now, if you have the browser cookie, you'll be able to log in forever), but that's about it.
>
> Note: Hmm... "protocol" is the wrong label. "service" would be more adequate.
OK. So this is really exciting. The first integration of this concept could be tested in search.metacpan.org After that, we'd like to get it going in iCPAN, which would be a huge plus. What are your thoughts on the storage mechanism for this? I *think* this info should be in the ElasticSearch index, but I'm not 100% sure about what kind of queries we'd be able to run on it there. Mark would have a much better idea about this end of things. As far as organization goes, we'd need to consider if we want this in the 'cpan' index of ES, or in a different index. Right now the 'cpan' index is just info we've gleaned from CPAN as well as the author-contributed JSON files. That index will periodically be recreated from scratch (I think), unlike this kind of information, so it may make sense to break it out, but keep it in the API under a different URL.
Having said that, if you'd like to play with the ES instance we're currently using, I've got a snapshot here:
Unzip and "bin/elasticsearch -f" will get you started.
So, that's not to say that we shouldn't use MySQL at all. I just think we need to think this part out a little more.
>
> 4. Once we are authenticated, we can return to 'http://cpanvote.babyl.ca/sample', where we should now see the voting buttons. Clicking on those triggers an ajax call to '/dist/<distname>/vote/(yea|nea|meh)', which record the vote.
>
>
>
> And that's pretty much it. All the magic revolves around the fact that the authentication cookie is set in a cookie for 'cpanvote.babyl.ca', so
> it'll be working for ajax calls coming from, say, search.metacpan.org or search.cpan.org.
Having that kind of seamless integration from different sites is a really big plus. Will we run into cross-domain issues with this?
Best,
Olaf
No, t'was "Der letzte Bulle". The premise is that a cop from the
80s (think big macho cop from the era that gave us Lethal Weapon,
Beverly Hill Cop, etc) gets shot and fall in a coma from which he wakes
up now, 20-something years later. Big culture clash. :-) The main
actor is the guy who was also playing the gay cop from Mit Herz und
Handschellen.
>
> OK. So this is really exciting. The first integration of this concept could be tested in search.metacpan.org After that, we'd like to get it going in iCPAN, which would be a huge plus.
Meanwhile, I've also been playing with Greasemonkeying it on
search.cpan.org. The (almost) working GMscript is at my Github repo
(https://github.com/yanick/cpanvote), if you want to play with it.
> What are your thoughts on the storage mechanism for this? I *think* this info should be in the ElasticSearch index, but I'm not 100% sure about what kind of queries we'd be able to run on it there. Mark would have a much better idea about this end of things. As far as organization goes, we'd need to consider if we want this in the 'cpan' index of ES, or in a different index. Right now the 'cpan' index is just info we've gleaned from CPAN as well as the author-contributed JSON files. That index will periodically be recreated from scratch (I think), unlike this kind of information, so it may make sense to break it out, but keep it in the API under a different URL.
Right now, my knowledge of ElasticSearch is... very basic, so I
don't really have any serious thoughts on the topic. I'll read a little
bit on it, look at the metacpan code, and try to build some feeling for
it before I will try my luck at giving an opinion. Till that day happen,
I consider you guys as the experts, and will thrust your judgment. :-)
> Having said that, if you'd like to play with the ES instance we're currently using, I've got a snapshot here:
>
> www.metacpan.org/es.zip
>
> Unzip and "bin/elasticsearch -f" will get you started.
Excellent! I'll do, thanks!
>
> Having that kind of seamless integration from different sites is a really big plus. Will we run into cross-domain issues with this?
Yes. :-) But I'm already looking at them (via the GMscript
experiment). I think I'll be able to circumvent the cross-domain
problems via JSONP. Unfortunately, Dancer doesn't have a JSONP
serializer yet. Fortunately, it's not hard to implement on top of the
already-existing JSON serializer, so I might give it a whirl when I have
a few hours free.
Talking of which, while I'm using Dancer right now, I'm
semi-thinking about swinging back to Catalyst, as it already has the
JSONP serializer, and the chained actions could come in handy at a
couple places. In all cases, I'm at a stage where it's still easy to
switch back and forth, so it's not much of an issue (and a great
opportunity to compare the two frameworks).
Joy,
`/anick
>
>>> Oh, I know I'm complaining with my belly full, so to speak. I had two full vacation weeks before and during the Holidays. Blissful, blissful time filled with eggnog, fun German retro-cop tv series (don't ask) and happy hacking. :-)
>> Was it Tatort? That's my favourite German cop show.
>
> No, t'was "Der letzte Bulle". The premise is that a cop from the 80s (think big macho cop from the era that gave us Lethal Weapon, Beverly Hill Cop, etc) gets shot and fall in a coma from which he wakes up now, 20-something years later. Big culture clash. :-) The main actor is the guy who was also playing the gay cop from Mit Herz und Handschellen.
I now consider myself educated. I'll have to check those out.
>
>>
>> OK. So this is really exciting. The first integration of this concept could be tested in search.metacpan.org After that, we'd like to get it going in iCPAN, which would be a huge plus.
>
> Meanwhile, I've also been playing with Greasemonkeying it on search.cpan.org. The (almost) working GMscript is at my Github repo (https://github.com/yanick/cpanvote), if you want to play with it.
I've just tried it out. The authentication isn't working for me, but I suppose that's what you mean by "almost". :) It's a great concept, though. Would be nice to see people using it via GreaseMonkey. We could then also add this to the cpan-mangler script so that GreaseMonkey would not be required at all.
>
>> What are your thoughts on the storage mechanism for this? I *think* this info should be in the ElasticSearch index, but I'm not 100% sure about what kind of queries we'd be able to run on it there. Mark would have a much better idea about this end of things. As far as organization goes, we'd need to consider if we want this in the 'cpan' index of ES, or in a different index. Right now the 'cpan' index is just info we've gleaned from CPAN as well as the author-contributed JSON files. That index will periodically be recreated from scratch (I think), unlike this kind of information, so it may make sense to break it out, but keep it in the API under a different URL.
>
> Right now, my knowledge of ElasticSearch is... very basic, so I don't really have any serious thoughts on the topic. I'll read a little bit on it, look at the metacpan code, and try to build some feeling for it before I will try my luck at giving an opinion. Till that day happen, I consider you guys as the experts, and will thrust your judgment. :-)
We've been getting a lot of our help from #elasticsearch on IRC. My problem is that I don't know how easy or complex it would be to get the kind of data from ElasticSearch which would make it easy to create reports on most upvoted and most downvoted modules etc, since I'd generally be doing that in SQL. Will be an interesting problem to look at, though.
Best,
Olaf
Over the week-end, I returned to the Catalyst version of the app,
and reimplemented everything. It's not
running on cpanvote.babyl.ca (and the code in my Github repo has been
updated). Beside the switch to the big C, here's what's new:
* sessions are now stored in the database. For now, the plan is to
use the sessions merely as an authentication token.
* the greasemonkey script works (at least for me). Note that if
you want to play with it, don't forget to
change the url inside the code from enkidu to cpanvote.babyl.ca.
* Recommendations are added. They can be queried via
GET /dist/<distname>/instead
and one can register his or her own recommendation via
PUT /dist/<distname>/instead/<other distname>
(note that this will soon change to
/dist/<distname>/instead/use/<other distname> so as not to paint myself
in a namespace corner)
* After one goes through the authentication loop with Twitter,
she's automatically redirected to the original webpage.
Aaaaaand... that's it. Doesn't sound like a lot, but the
Authentication sub-system of Catalyst gave me a good brain workout
before I began to grok it correctly.
Next steps for this week include beginning to document the
service's URIs and add a few more basic ones. Oh, and blog about my
adventures. :-)
And now, for the ongoing conversation:
> Meanwhile, I've also been playing with Greasemonkeying it on search.cpan.org. The (almost) working GMscript is at my Github repo (https://github.com/yanick/cpanvote), if you want to play with it.
> I've just tried it out. The authentication isn't working for me, but I suppose that's what you mean by "almost". :)
Yup. :-)
> It's a great concept, though. Would be nice to see people using it via GreaseMonkey. We could then also add this to the cpan-mangler script so that GreaseMonkey would not be required at all.
Totally. If we do that, the only thing we should have to alter to
the GMscript are the ajax calls that will have to go from JSON to JSONP,
to get around the cross-domain limitations.
> We've been getting a lot of our help from #elasticsearch on IRC. My problem is that I don't know how easy or complex it would be to get the kind of data from ElasticSearch which would make it easy to create reports on most upvoted and most downvoted modules etc, since I'd generally be doing that in SQL. Will be an interesting problem to look at, though.
Sounds like a nice spiky problem to think about at the next
meeting. :-)
Joy,
`/anick
> Ooookay...
>
> Over the week-end, I returned to the Catalyst version of the app, and reimplemented everything. It's not
> running on cpanvote.babyl.ca (and the code in my Github repo has been updated). Beside the switch to the big C, here's what's new:
>
> * sessions are now stored in the database. For now, the plan is to use the sessions merely as an authentication token.
>
> * the greasemonkey script works (at least for me). Note that if you want to play with it, don't forget to
> change the url inside the code from enkidu to cpanvote.babyl.ca.
The GM script works very well for me.
>
> * Recommendations are added. They can be queried via
>
> GET /dist/<distname>/instead
>
> and one can register his or her own recommendation via
>
> PUT /dist/<distname>/instead/<other distname>
>
> (note that this will soon change to /dist/<distname>/instead/use/<other distname> so as not to paint myself in a namespace corner)
>
> * After one goes through the authentication loop with Twitter, she's automatically redirected to the original webpage.
>
>
> Aaaaaand... that's it. Doesn't sound like a lot, but the Authentication sub-system of Catalyst gave me a good brain workout before I began to grok it correctly.
Worked beautifully for me.
>
>> It's a great concept, though. Would be nice to see people using it via GreaseMonkey. We could then also add this to the cpan-mangler script so that GreaseMonkey would not be required at all.
>
> Totally. If we do that, the only thing we should have to alter to the GMscript are the ajax calls that will have to go from JSON to JSONP, to get around the cross-domain limitations.
Let me know when you think the GM script is mature enough and we'll look at tweaking the cpan-mangler to add it in.
>
>> We've been getting a lot of our help from #elasticsearch on IRC. My problem is that I don't know how easy or complex it would be to get the kind of data from ElasticSearch which would make it easy to create reports on most upvoted and most downvoted modules etc, since I'd generally be doing that in SQL. Will be an interesting problem to look at, though.
>
> Sounds like a nice spiky problem to think about at the next meeting. :-)
I think so. Next meeting is Thursday of next week. :)
Best,
Olaf
On 11-01-19 10:18 AM, Olaf Alders wrote:
> * the greasemonkey script works (at least for me). Note that if you
> want to play with it, don't forget to
>> change the url inside the code from enkidu to cpanvote.babyl.ca.
> The GM script works very well for me.
Yay! \o/ It's always fun to see it work on one's own computer,
but it only get *really* thrilling when it begins to run elsewhere too. :-)
> Aaaaaand... that's it. Doesn't sound like a lot, but the Authentication sub-system of Catalyst gave me a good brain workout before I began to grok it correctly.
> Worked beautifully for me.
Puuurfect.
> Totally. If we do that, the only thing we should have to alter to the GMscript are the ajax calls that will have to go from JSON to JSONP, to get around the cross-domain limitations.
> Let me know when you think the GM script is mature enough and we'll look at tweaking the cpan-mangler to add it in.
I've uploaded to GitHub my latest version of the GM script and the
Catalyst app. That version of the script is, I think, sufficiently close
to the final version that you can begin to play with it (it still has to
be prettified, but the ajax calls should remain the same).
Caveat, though: the machine 'cpanvote.babyl.ca' is my main server,
in my basement, connected to the 'Net via my crummy Bell (well,
teksavvy) connection. The JSON traffic is pretty light, and I'm sure I
can withstand testing traffic, but we'll definitively move the whole
thing somewhere a little more, ah, sturdy before we begin to use it in
earnest. :-)
Oh, but I've switched the database backend to be Postgres instead
of SQLite, so at least it should be a little more robust than it was a
few days ago.
>>
>> Sounds like a nice spiky problem to think about at the next meeting. :-)
> I think so. Next meeting is Thursday of next week. :)
>
Hope it was a good one. :-)
Joy,
`/anick
On 2011-01-27, at 9:26 PM, Yanick Champoux wrote:
>> Totally. If we do that, the only thing we should have to alter to the GMscript are the ajax calls that will have to go from JSON to JSONP, to get around the cross-domain limitations.
>> Let me know when you think the GM script is mature enough and we'll look at tweaking the cpan-mangler to add it in.
>
> I've uploaded to GitHub my latest version of the GM script and the Catalyst app. That version of the script is, I think, sufficiently close to the final version that you can begin to play with it (it still has to be prettified, but the ajax calls should remain the same).
Excellent.
>
> Caveat, though: the machine 'cpanvote.babyl.ca' is my main server, in my basement, connected to the 'Net via my crummy Bell (well, teksavvy) connection. The JSON traffic is pretty light, and I'm sure I can withstand testing traffic, but we'll definitively move the whole thing somewhere a little more, ah, sturdy before we begin to use it in earnest. :-)
I think it would make sense to move it to the same cloud instance as our ElasticSearch back end. The reason I say this is that the easy way to update ElasticSearch is by doing it directly from localhost. Otherwise, we'd have to figure out something which is a little more complicated.
>
> Oh, but I've switched the database backend to be Postgres instead of SQLite, so at least it should be a little more robust than it was a few days ago.
Great!
>>> Sounds like a nice spiky problem to think about at the next meeting. :-)
>> I think so. Next meeting is Thursday of next week. :)
>>
>
> Hope it was a good one. :-)
It was! Mark and I talked it over and I think the best way to go right now would be to keep the stats in 2 places. That sounds stupid, but the issue is that:
a) we really need them in ElasticSearch so that we can join this data with dist info when we do searches
b) the whole ElasticSearch end isn't stable yet. I still occasionally trash the index and start from scratch. I can do that easily because all of our info (up to this point) can quite easily be recreated and inserted. that's not the case with user-contributed info like this.
So, ideally, to start with, I think we should use your interface for the voting, but I'd like to see a method in the cpanvote app which then also updates the ElasticSearch index with the same info. The only other functionality we'd need is a way to batch export *everything* into ElasticSearch, which is what we'd need to do when the index is recreated or if we just wanted to make sure everything is up to date.
I don't think it's much extra work since you've already got the database end working and the inserts into ElasticSearch are actually trivial. So, it functions as a backup system and it means you don't need to rework the core functionality of what you've done in order to make this work with MetaCPAN. I can certainly write some code for this end of things, if you'd like me to. My only personal issue is that I'm awaiting the arrival of a newborn in the coming weeks. Once that happens, I'll have a lot of things going on which will keep me from coding. :)
Do you have any concerns about this dual use of Postgres and ElasticSearch?
Best,
Olaf
On 11-01-29 12:07 AM, Olaf Alders wrote:
>> Caveat, though: the machine 'cpanvote.babyl.ca' is my main server,
>> in my basement, connected to the 'Net via my crummy Bell (well,
>> teksavvy) connection. The JSON traffic is
>> pretty light, and I'm sure I can withstand testing traffic, but we'll
>> definitively move the whole thing somewhere a little more, ah, sturdy
>> before we begin to use it in earnest. :-)
>
> I think it would make sense to move it to the same cloud instance as
> our ElasticSearch back end. The reason I say this is that the easy
> way to update ElasticSearch is by doing it directly from localhost.
> Otherwise, we'd have to figure out something which is a little more
> complicated.
No, that's perfectly fine with me. :-)
>>> I think so. Next meeting is Thursday of next week. :)
>>>
>>
>> Hope it was a good one. :-)
>
> It was! Mark and I talked it over and I think the best way to go
> right now would be to keep the stats in 2 places. [..]
Actually, that makes a lot of sense. Of course, it commits the sin of
having information at two different places but, hey, we are walking the
Earth, not Heaven -- nothing works down here without a venial sin or
two. ;-)
If someone can help me with the REST uri and JSON to use with the
ElasticSearch instance, I'm pretty sure I can quickly come up with an
export script.
Hmmm... In light of that syncing between the two back-ends, we should
add a 'last_change' timestamp on the user's votes so that we can easily
grab the deltas in-between syncs. And it's also good data to have
anyway to give a temporal dimension to the votes.
> I don't think it's much extra work since you've already got the
> database end working and the inserts into ElasticSearch are actually
> trivial. So, it functions as a backup system and it means you don't
> need to rework the core functionality of what you've done in order to
> make this work with MetaCPAN.
Not that I would mind any rework, mind you. As far as I'm concerned,
it's all exploratory code done for the sheer fun of it. That it actually
solves our problem is merely a nice side-effect. ;-)
But this being said, I really don't think the interaction is going to
be very hard. I even suspect it'll all boil down to something like:
my $rs = $schema->resultset('Votes')->search({
last_change => { '>' => $last_sync }
});
while( my $v = $rs->next ) {
update_elastic_search_with( $v );
}
or some variation of it.
> I can certainly write some code for
> this end of things, if you'd like me to. My only personal issue is
> that I'm awaiting the arrival of a newborn in the coming weeks.
Oh my! Congratulations! Is it your first child process? :-)
> Once
> that happens, I'll have a lot of things going on which will keep me
> from coding. :)
No kidding. Uh, no, wait. Actually, it's going to be, literally,
kidding. But yeah, I know exactly what you mean. :-)
> Do you have any concerns about this dual use of Postgres and
> ElasticSearch?
None at all. This is what makes the most sense for the current
situation, and I don't see it as painting us in any awkward corner, so
I'd say let's go for it.
Oh, and a last note: I used Postgres because it's my default db of
choice, but I'm not using any special feature that binds us to it. So if
we have a good rational to go with MySQL instead, we can trivially jump
ship.
Joy,
`/anick
On 2011-01-29, at 1:41 PM, Yanick Champoux wrote:
>> It was! Mark and I talked it over and I think the best way to go
>> right now would be to keep the stats in 2 places. [..]
>
> Actually, that makes a lot of sense. Of course, it commits the sin of having information at two different places but, hey, we are walking the Earth, not Heaven -- nothing works down here without a venial sin or two. ;-)
>
> If someone can help me with the REST uri and JSON to use with the ElasticSearch instance, I'm pretty sure I can quickly come up with an export script.
xsawyerx is actually just working on a Perl wrapper around the REST API, but you can also use the ElasticSearch module directly. If you have a look at elasticsearch/index_cpanratings.pl in the CPAN-API repo, that should give you an idea. I should probably put something together that's a little more in-depth, though.
>> I don't think it's much extra work since you've already got the
>> database end working and the inserts into ElasticSearch are actually
>> trivial. So, it functions as a backup system and it means you don't
>> need to rework the core functionality of what you've done in order to
>> make this work with MetaCPAN.
>
> Not that I would mind any rework, mind you. As far as I'm concerned, it's all exploratory code done for the sheer fun of it. That it actually solves our problem is merely a nice side-effect. ;-)
>
> But this being said, I really don't think the interaction is going to be very hard. I even suspect it'll all boil down to something like:
>
> my $rs = $schema->resultset('Votes')->search({
> last_change => { '>' => $last_sync }
> });
>
> while( my $v = $rs->next ) {
> update_elastic_search_with( $v );
> }
>
>
> or some variation of it.
That looks good to me.
>
>
>> I can certainly write some code for
>> this end of things, if you'd like me to. My only personal issue is
>> that I'm awaiting the arrival of a newborn in the coming weeks.
>
> Oh my! Congratulations! Is it your first child process? :-)
Thank you! It's the second iteration, 21 months apart. The first time I didn't know what I was in for, but I was well rested. Now I have a good idea of what to expect, but I'm sooooo tired from the first one. :)
>
>> Once
>> that happens, I'll have a lot of things going on which will keep me
>> from coding. :)
>
> No kidding. Uh, no, wait. Actually, it's going to be, literally, kidding. But yeah, I know exactly what you mean. :-)
:)
>
>
>> Do you have any concerns about this dual use of Postgres and
>> ElasticSearch?
>
> None at all. This is what makes the most sense for the current situation, and I don't see it as painting us in any awkward corner, so I'd say let's go for it.
Great!
>
> Oh, and a last note: I used Postgres because it's my default db of choice, but I'm not using any special feature that binds us to it. So if we have a good rational to go with MySQL instead, we can trivially jump ship.
I've never used Postgres, so this would probably be a good opportunity for me to learn something about it.
Best,
Olaf
On 11-01-30 11:50 PM, Olaf Alders wrote:
>> If someone can help me with the REST uri and JSON to use with the
>> ElasticSearch instance, I'm pretty sure I can quickly come up with
>> an export script.
>
> xsawyerx is actually just working on a Perl wrapper around the REST
> API, but you can also use the ElasticSearch module directly. If you
> have a look at elasticsearch/index_cpanratings.pl in the CPAN-API
> repo, that should give you an idea. I should probably put something
> together that's a little more in-depth, though.
I'll have a look, thanks. And we have xsawyerx on board as well? Ooooh,
excellent. :-)
>> Oh my! Congratulations! Is it your first child process? :-)
>
> Thank you! It's the second iteration, 21 months apart. The first
> time
> I didn't know what I was in for, but I was well rested. Now I have a
> good idea of what to expect, but I'm sooooo tired from the first one. :)
It should be easier this time around, as knowing what to expect is
already half the battle. But then, every child is a roll of dice, and
you never know when you'll get an angel or, er, lots of personality. In
all cases, I'm keeping my fingers crossed for you for the easiest ride
possible. :-)
>> Oh, and a last note: I used Postgres because it's my default db of
> choice, but I'm not using any special feature that binds us to it. So if
> we have a good rational to go with MySQL instead, we can trivially jump
> ship.
>
> I've never used Postgres, so this would probably be a good opportunity for me to learn something about it.
Lesson #1 with Postgres that made me swear quite a lot last week-end:
it takes its case-sensitivity very seriously. If you make
the mistake to call your table Users, you'll have to write your queries
like
SELECT * FROM "Users"
No double-quotes? Postgres says there is no such table. That... didn't
play too well with DBIx::Class. Once I finally cottoned up on what was
the core issue, it was easy enough to change the table names to all
lowercases, but in the meantime I had fun learning how DBI can
auto-quote its tokens. :-)
Joy,
`/anick
Just a fyi that I'll hit 'apt-get install postgres' on www.metacpan.org
in a few minutes. That shouldn't affect anything, but just in case it
does, well, at least you know it's my fault. :-)
Also, I've created the user 'cpanvote' that will run the cpanvote app.
I've copied the .ssh authorized file from the root user for that
account, so that whoever can log as root can also log directly as
cpanvote as well.
Joy,
`/anick
>
> Just a quick pre week-end chirp to say that the GreaseMonkey script has been extended to mangle everything that looks like a dist page on search.cpan.org. :-)
>
> On 11-01-30 11:50 PM, Olaf Alders wrote:
>>> If someone can help me with the REST uri and JSON to use with the
>>> ElasticSearch instance, I'm pretty sure I can quickly come up with
>>> an export script.
>>
>> xsawyerx is actually just working on a Perl wrapper around the REST
>> API, but you can also use the ElasticSearch module directly. If you
>> have a look at elasticsearch/index_cpanratings.pl in the CPAN-API
>> repo, that should give you an idea. I should probably put something
>> together that's a little more in-depth, though.
> I'll have a look, thanks. And we have xsawyerx on board as well? Ooooh, excellent. :-)
We do! http://search.cpan.org/perldoc?MetaCPAN::API
Also just chatting with Shlomi Fish in the IRC channel and he has some interest in module tagging, which I believe ties into work he started on with CPANHQ. He may get in touch with you about that.
Best,
Olaf
Well, I should note that the distribution's maintainer one can already tag
modules by populating the META.yml 'keywords' list with arbitrary, free form
strings. My idea was to reserve several namespaces of such keywords
(distinguished by a common prefix and then a slash) to have special meaning:
1. https://bitbucket.org/shlomif/rethinking-cpan/overview - CPAN "definitive
tags" which assign some meta-data to the distribution like "works_on_linux",
"works_on_windows", "magic", "black_magic", "deprecated", "stable_api", etc.
2. A classification system - possibly a http://en.wikipedia.org/wiki/Taxonomy
(similar to Freshmeat.net's old trove categorisation, or what is present in
only more suited towards CPAN.). We can have something with a central control
like Freshmeat.net used to have (which requires maintenance and a central
authority) or maybe allow something like wikipedia's categories, which are
world-editable and definable (and so may contain some directed loops, and may
be misused.).
--------------
Naturally, I don't want to completely eschew free-form tags. And of course,
there should be a way for unprivileged users to tag such modules, although
possibly with a lesser value.
Regards,
Shlomi Fish
--
-----------------------------------------------------------------
Shlomi Fish http://www.shlomifish.org/
List of Portability Libraries - http://shlom.in/port-libs
Chuck Norris can make the statement "This statement is false" a true one.
Please reply to list if it's a mailing list post - http://shlom.in/reply .
On 11-02-10 06:02 PM, Shlomi Fish wrote:
> Well, I should note that the distribution's maintainer one can already tag
> modules by populating the META.yml 'keywords' list with arbitrary, free form
> strings.
True. And there is, of course, the namespace of a module itself that
can be used to seed its tags (e.g., XML::LibXML can be given the tags
'XML' and 'LibXML' by default).
> 2. A classification system - possibly a http://en.wikipedia.org/wiki/Taxonomy
> (similar to Freshmeat.net's old trove categorisation, or what is present in
> only more suited towards CPAN.). We can have something with a central control
> like Freshmeat.net used to have (which requires maintenance and a central
> authority) or maybe allow something like wikipedia's categories, which are
> world-editable and definable (and so may contain some directed loops, and may
> be misused.).
>
> --------------
>
> Naturally, I don't want to completely eschew free-form tags. And of course,
> there should be a way for unprivileged users to tag such modules, although
> possibly with a lesser value.
I must say, I'm more partial to free-form tags, but both can easily be
experimented with.
Joy,
`/anick
> Hi Shlomi, guys,
>
> On 11-02-10 06:02 PM, Shlomi Fish wrote:
>> Well, I should note that the distribution's maintainer one can already tag
>> modules by populating the META.yml 'keywords' list with arbitrary, free form
>> strings.
>
> True. And there is, of course, the namespace of a module itself that can be used to seed its tags (e.g., XML::LibXML can be given the tags 'XML' and 'LibXML' by default).
Ah, good idea.
>
>> 2. A classification system - possibly a http://en.wikipedia.org/wiki/Taxonomy
>> (similar to Freshmeat.net's old trove categorisation, or what is present in
>> only more suited towards CPAN.). We can have something with a central control
>> like Freshmeat.net used to have (which requires maintenance and a central
>> authority) or maybe allow something like wikipedia's categories, which are
>> world-editable and definable (and so may contain some directed loops, and may
>> be misused.).
>>
>> --------------
>>
>> Naturally, I don't want to completely eschew free-form tags. And of course,
>> there should be a way for unprivileged users to tag such modules, although
>> possibly with a lesser value.
>
> I must say, I'm more partial to free-form tags, but both can easily be experimented with.
As am I. My feeling would be that we could have some default tags, but that you should be able to tag anything as you wish. As far as tag values go, I would think it's not really our job to weight tags. That can be left as an exercise to the reader. I would think we could have a system of weighting tags in search.metacpan.org etc, but the job of the API really is just to record tags and spit them back out. It's a "judgement free zone". :)
Best,
Olaf