GSoc 2017: Increase Crawling Performance through page clustering

34 views
Skip to first unread message

Mahmoud Mohammadi

unread,
Feb 28, 2017, 11:00:21 AM2/28/17
to portia-scraper
Dear Mentors,

I am Mahmoud, a computer science grad student form US. I am planning to contribute in GSoC 2017.
I love python due to its powerful libraries and syntax I used them for example in machine learning courses or Jupyter notebooks. I was exploring the ideas mentioned in gsoc2017.scrapinghub.com and found "Increase Crawling Performance through page clustering" an interesting topic I want to contribute in. 

I have good knowledge of web programming and browser simulating libraries such as Selenium and JWebUnit due to projects I've done. I also contributed an opensource project (http://appsensor.org/) to provide some libraries to connect to its REST API web services.
Moreover, I'v done programming projects related to information retrieval on Wikipedia and Twitter data.

I found  "Ruairi Fahy" as the mentor for this idea and hope I can get more information, warm up tasks or some starting guides for this idea.

Thank you,
Mahmoud

Reply all
Reply to author
Forward
0 new messages