How to get hidden url accessible trough a link stubbed out with ’#’

liugg

unread,

Apr 25, 2017, 9:35:20 AM4/25/17

to nokogiri-talk

I would like to use nokogiri to scrape data from the Wta website.
The information I am looking for concerns single rankings and single road rankings.
Rankings are accessible through the following link: http://www.wtatennis.com/rankings

Just below the pictures portraying the first single ranked player and the first single road ranked player, there is a menu for accessing the related categories: single, doubles, road singles, road doubles. The problem is that clicking on these links the url in the address bar of the browser does not change. Web inspector shows that these links are all created with the same <a> tag with href="#".

The link http://www.wtatennis.com/rankings however belongs to the single rankings page, so I assume there should be a hidden link for the road rankings page.
Web inspector shows that they are two different documents, they are not included in the same web document.

Other urls I would need are those allowing to access pagination, in order to get information about players ranked above number 100.
The corresponding numbered links are below the list of the players.
These links are also stubbed out with ’#’:

<a class="footable-page-link" href="#">1</a>
<a class="footable-page-link" href="#">2</a>
<a class="footable-page-link" href="#">3</a>
...

Is there any way to get access to these urls?
I have no idea how they are generated. At first I thought about tabbed-navigation, but there should be a unique page divided in sections accessible through a unique id.

Mike Dalessio

unread,

Apr 30, 2017, 5:50:43 PM4/30/17

to nokogiri-talk

Hi,

Thanks for asking this question. Can you share the work you've done so far? Sharing your code will allow other members of the list to provide more directed and helpful advice.

-m

--
You received this message because you are subscribed to the Google Groups "nokogiri-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nokogiri-talk+unsubscribe@googlegroups.com.
To post to this group, send email to nokogi...@googlegroups.com.
Visit this group at https://groups.google.com/group/nokogiri-talk.
For more options, visit https://groups.google.com/d/optout.

liugg

unread,

May 2, 2017, 4:41:57 AM5/2/17

to nokogiri-talk

The code largely depends on the missing urls.
Supposing that url and url2 are respectively:

- the url of the WTA rankings race single (1 to 1000 players)
- the url of the WTA rankings single (1 to 1000 players)

and supposing I know these urls, I would use Nokogiri to scrape name, points and rank of each player.
Rankings race single are useful to keep an eye on a player's performance:

url = "http://www.wtatennis.com/single-race/pag/1-1000"
doc = Nokogiri::HTML(open(url))
range = (1..1000)

range.each do |num|
    wtaname      = doc.at_css("tr:nth-child(#{num}) .player a").text
    wtapoints = doc.at_css("tr:nth-child(#{num}) .points").text.to_i
    
    tennist = WtaRace.find_by(name: wtaname)
    if tennist.present?
        tennist.update_attribute(:newpoints, wtapoints)
    else
        WtaRace.create!(name: wtaname, oldpoints: wtapoints, newpoints: wtapoints)        
    end
end

Rankings single are useful to know the overall ranking of each player:

url2  = "http://www.wtatennis.com/singles-rankings/pag/1-1000"
doc2  = Nokogiri::HTML(open(url2))

range.each do |num|
    wtaranking = doc2.at_css("tr:nth-child(#{num}) .footable-first-visible").text.to_i
    wtaname    = doc2.at_css("tr:nth-child(#{num}) .player a").text
    wtapoints  = doc2.at_css("tr:nth-child(#{num}) .points").text.to_i

    wtatennist = WtaRank.find_by(name: wtaname)
    if wtatennist.present?
        wtatennist.update_attribute(:ranking, wtaranking)
        wtatennist.update_attribute(:points, wtapoints)
    else
        WtaRank.create!(ranking: wtaranking, name: wtaname, points: wtapoints)
    end
end

Reply all

Reply to author

Forward