Problems to extract text in a popup window

910 views
Skip to first unread message

Pioto

unread,
Jan 21, 2015, 4:58:50 AM1/21/15
to web-s...@googlegroups.com
Hello!

I am scraping a site and was able to solve most problems with trial and error, but now I have come to a point where I don't know how to continue.

I am working on this page:

http://goo.gl/3vxAdN

I would like to scrape the information that is behind the list of popup links. First link is: Bil_006547

I am able to select all links on the page (that was quite tricky, I used the CSS path and experimented until I found "div nobr"). So the element and data previews work fine. When I click on the link, a new window opens. In the example there are three numbers (35-115908   35-115915  24-115926) which I would like to put into a single record if possible, but one record each is ok too). I don't know how to scrape these numbers.

I hope someone here can help me. I can post the whole site list if necessary.

Thanks for help
Andreas

Mārtiņš Balodis

unread,
Jan 22, 2015, 4:33:12 AM1/22/15
to Pioto, web-scraper
Hi,
The example link isn't showing the data you are describing. Probably the links are unique per user session. I found a page that has links like "BIil_<num>" which open a popup within the page. These links open a popup in the page but they also can be used for navigation. You should try using a Link selector for these links. To do element selection in these link pages you should right click on one of these links and choose "open link in new tab". Then in the new tab you can continue with the element selection. You can use Grouped selector to store multiple text elements in a single record.

--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Pioto

unread,
Jan 28, 2015, 4:19:45 PM1/28/15
to web-s...@googlegroups.com, kitupii...@googlemail.com
Hi Martins,

thanks for the answer. Unfortunately I could not make the scraping process work. I think I am pretty close, but I cannot figure out the last step.

Here is where I am stuck:
I hope the link works this time. But you were on a suitable page anyway.
There are three elements to be scraped on that page, First one is the link Bil_016140  
I need the entire text in the opening popup window, e.g. in a group element. I tried to follow your instructions. However, if I select a popup link, the window stays open and the scraping process freezes (popup window remains open), whereas a normal link does not open any windows and no scraping takes place. 

I would be very grateful for help here. 

Thanks Martins!
A.

Mārtiņš Balodis

unread,
Jan 30, 2015, 9:59:06 AM1/30/15
to Pioto, web-scraper
Hi,
The link isn't working for me. Here is a sitemap that goes into the "Bil" link and extracts data. You will have to change the selectors if you want the sitemap to go to all car models.

{"_id":"carparts","startUrl":"http://web1.carparts-cat.com/default.aspx?11=18&10=097FB767A5004ACC8A8C9A9A8A857822018001&14=1&12=100","selectors":[{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"make-link","selector":"div.liste_spalte_daten:nth-of-type(1) div.panel_link:nth-of-type(4) a","delay":""},{"parentSelectors":["make-link"],"type":"SelectorLink","multiple":true,"id":"model-link","selector":"tr.row:nth-of-type(7) td.cell:nth-of-type(1) a","delay":""},{"parentSelectors":["model-link"],"type":"SelectorLink","multiple":true,"id":"model-type-link","selector":"tr.row:nth-of-type(5) a","delay":""},{"parentSelectors":["model-type-link"],"type":"SelectorLink","multiple":true,"id":"panel-link","selector":"div.panel_baugru_no_gru a","delay":""},{"parentSelectors":["panel-link"],"type":"SelectorLink","multiple":true,"id":"subpanel-link","selector":"div.panel_baugru_no_gru:nth-of-type(2) a","delay":""},{"parentSelectors":["subpanel-link"],"type":"SelectorLink","multiple":true,"id":"bil-link","selector":"#content_table_pnl .pnl_link_eartnr>a","delay":""},{"parentSelectors":["bil-link"],"type":"SelectorText","multiple":true,"id":"artikelnr","selector":"td.article_nr_cell","regex":"","delay":""},{"parentSelectors":["subpanel-link"],"type":"SelectorHTML","multiple":false,"id":"details","selector":"div.fzg_detail_small","regex":"","delay":""}]}
Reply all
Reply to author
Forward
0 new messages