Scraping data from match statistics menu on WhoScored?

2,779 views
Skip to first unread message

Mario Omescu

unread,
Aug 22, 2016, 10:11:15 AM8/22/16
to Web Scraper

Hi,


I would like to go thru all of the statistics menus and scrape all of the data?

Mārtiņš Balodis

unread,
Aug 23, 2016, 3:46:27 AM8/23/16
to Mario Omescu, Web Scraper
Hi,
You can do that with element click selector. I would suggest to scrape each tab with a different sitemap because you won't be able to merge player stats from multiple tabs. Here is an example sitemap that scrapes offensive tab.

{"_id":"whoscored2","startUrl":"https://www.whoscored.com/Matches/1080509/LiveStatistics/England-Premier-League-2016-2017-Chelsea-West-Ham","selectors":[{"parentSelectors":["_root"],"type":"SelectorElementClick","multiple":true,"id":"click-tab","selector":"div.statistics-table-tab:has(table.grid)","clickElementSelector":"div.option-group li a:contains('Offensive')","clickElementUniquenessType":"uniqueCSSSelector","clickType":"clickOnce","discardInitialElements":true,"delay":"2000"},{"parentSelectors":["click-tab"],"type":"SelectorTable","multiple":true,"id":"table","selector":"table.grid","tableHeaderRowSelector":"thead tr","tableDataRowSelector":"tbody tr","columns":[{"header":"R","name":"R","extract":true},{"header":"Player","name":"Player","extract":true},{"header":"Shots","name":"Shots","extract":true},{"header":"ShotsOT","name":"ShotsOT","extract":true},{"header":"KeyPasses","name":"KeyPasses","extract":true},{"header":"PA%","name":"PA%","extract":true},{"header":"AerialsWon","name":"AerialsWon","extract":true},{"header":"Touches","name":"Touches","extract":true},{"header":"Rating","name":"Rating","extract":true},{"header":"Key Events","name":"Key Events","extract":true}],"delay":""}]}

--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mario Omescu

unread,
Aug 26, 2016, 4:49:51 AM8/26/16
to Web Scraper
Hi Mārtiņš.

So sorry for replying so late and thank you so much for your support. It does help a lot.
I do have a couple of more questions: 
1. if I try to replicate the following: "selector":"div.statistics-table-tab:has(table.grid)" unfortunately I cannot do it. Did you do something specific to chose just that portion?

Is there a way to navigate thru the weeks and start to scrape all of the matches of the season, since the URL does not change, I have tried most of the elements but I can only scrape just the first week?

Is there a way to scrape the chalkboard Tab?

Hope you can help.
Kind regards,

Mārtiņš Balodis

unread,
Aug 26, 2016, 6:13:24 AM8/26/16
to Mario Omescu, Web Scraper
Hi,

1. Probably I wrote it manually. You can copy and paste the selector.

2. Element click selector should be the solution.

3. Also element click selector

--

leo chu

unread,
Jan 22, 2017, 4:12:15 AM1/22/17
to Web Scraper, omescu.ma...@gmail.com
Hi,

for 2, I have the same issue that only can scrape the first week. I think it is because the previous week button doesn't change after clicking, so all "click element uniqueness" are not suitable

I tried selecting both previous button and the text of week for click selector, but have no luck. Do you have any idea?

This is my sitemap:
{"startUrl":"https://www.whoscored.com/Regions/252/Tournaments/7/-Championship","selectors":[{"parentSelectors":["_root"],"type":"SelectorElementClick","multiple":true,"id":"matches","selector":"a.result-1","clickElementSelector":"a.previous span.ui-icon, a.date span.text","clickElementUniquenessType":"uniqueHTML","clickType":"clickMore","discardInitialElements":false,"delay":""},{"parentSelectors":["matches"],"type":"SelectorLink","multiple":false,"id":"match","selector":"_parent_","delay":""}],"_id":"whoscored_matches"} 

Thanks,
Leo
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages