MetaCPAN: Week #9

21 views

Skip to first unread message

Moritz Onken

unread,

Jul 29, 2011, 12:53:48 PM7/29/11

to tpf-gsoc...@googlegroups.com, Olaf Alders, Clinton Gormley

Hi,

as promised in the last mail, +1 went live is and is heavily used. With 450 +1's in just over a week, it seems to be accepted quite well. I planned to have the tagging implemented by now but just had no time to do so. I was working extensively on my master thesis. Most time I spend on MetaCPAN was to fix bugs that pile up quickly at the bug tracker.

I started tweaking the result scoring because there were lots of complaints that searches for e.g. Perl::Critic didn't return Perl::Critic as first result. I made a list of all those cases and write a test that verifies that those searches return the appropriate result first. However new failing cases arrive and Clinton and I agreed that we have to write our own tokenizer. Right now we are using the ElasticSearch tokenizer, which does a good job at splitting CamelCased words, but it's not good enough for our needs. The main problem is, that module names are split in terms with duplicated (e.g. Perl::Critic::Utils::Perl becomes perl, critic, utils, perl) which messes with the result scoring. Now this needs to reindex all of cpan. In order to do this faster (usually takes days), I started to improve the performance of the indexer with help of Tim Bunce. I was able to shove off about 50% of the runtime. Next up would be to write that tokenizer and reindex everything.

Schedule wise I'm a bit behind. Tagging should have been done by this week. I hope to catch up over the next few days. However, my master thesis requires a considerable amount of attention right now as I made some huge progress there as well.

Cheers,
mo

[1] http://api.metacpan.org/favorite?size=10&sort=date:desc

Reply all

Reply to author

Forward

0 new messages