Variables to select & last or final scroll

76 views
Skip to first unread message

Adrian

unread,
Apr 23, 2021, 3:57:55 PM4/23/21
to Web Scraping
Hi guys, do any of you know if it's possible to use variables in conjunction with the CSS Selector?

I'm trying to select a table row which is :nth-child(N) in that table.

N is a variable set with Extract before fetching the row data, starting in 1, and then I do another Extract command with N+1.

To fetch the data I try to use .row:nth-child(N) to tell worker where to search for the next row.
This is due to an infinite scroll table list which resets the entire list every time the scroll gets new data, so the tutorial which includes delete does not fit this scenario (duplicate entries and loads the work).

Lastly, I haven't found a way to check if the scroll is the last one possible and my plan is that the worker will just Stop running since there will be no possible nth-child to select and in so reaching the bottom.

I've attached pics of my parsreHub tree for some visual aid, please let me know what other alternatives there are. Perhaps Xpath does accept variables as I'm trying to use.

Thank you for your time,
Adrián.
Screenshot 2021-04-23 164957.png
Screenshot 2021-04-23 164957-2.png

Andrew11

unread,
Apr 23, 2021, 5:14:59 PM4/23/21
to Web Scraping
I think you can't combine selectors and variables, unfortunately. If I'm wrong I'd love to know, and maybe Ben from ParseHub will clue us in. For the infinite scroll, could you extract .row:last and see if it's changed between steps, or will that always be the same row? Feel free to post the URL too if you're stuck.

Premier Logik

unread,
Apr 26, 2021, 3:34:54 PM4/26/21
to Web Scraping
I need to scrape title, description, application url and location. The link is https://sa.indeed.com/Saudi-Arabia-jobs

Andrew11

unread,
Apr 26, 2021, 3:58:17 PM4/26/21
to Web Scraping
This one's halfway done -- maybe it's enough to get you started? Go to Projects -> Import project to load it. You'll need to make another template to handle pages after the first one that loads, and make sure that the data shows up for a bunch of different job entries.
sa.indeed.com_Project.zip

Premier Logik

unread,
Apr 27, 2021, 2:51:58 PM4/27/21
to Web Scraping
This template does not scrape job details completely. The details are scraped half or even some portion of them.
Please help me in this regard.
Thanks.

SAEED

Andrew11

unread,
Apr 27, 2021, 2:54:07 PM4/27/21
to Web Scraping
Do you mean the description gets cut off, or that some columns are missing when you look at the spreadsheet overall? Send the URL of an example page where it happens, too.

Premier Logik

unread,
Apr 28, 2021, 12:50:22 PM4/28/21
to Web Scraping
Yes, I am facing descriptions cut off issue.

Andrew11

unread,
Apr 28, 2021, 12:54:29 PM4/28/21
to Web Scraping
Try changing the Description select to the CSS: 
.jobsearch-JobComponent-description

Adrian

unread,
Apr 29, 2021, 11:10:05 AM4/29/21
to Web Scraping
The last row would always be different after the scroll until it can scroll no more which would then mean it's finished. 
Can javascript be used in this context? Where could I write that code?

Andrew11

unread,
Apr 29, 2021, 11:38:34 AM4/29/21
to Web Scraping
It's not exactly Javascript, but once you extract a variable it's available from other steps in the same template, including conditionals. So do something like Select & extract lastRow and compare it to previousLastRow in an If statement. If you like, post the URL again and I'll try to make a sample. I don't have to use this technique very often but I think it should work.

Premier Logik

unread,
Apr 29, 2021, 1:51:08 PM4/29/21
to Web Scraping
https://sa.indeed.com/Saudi-Arabia-jobs is the url. Please try including pagination also. 

Andrew11

unread,
Apr 29, 2021, 2:19:56 PM4/29/21
to Web Scraping
No, I meant the URL of a page where the description still gets cut off. It would be on one of the job detail pages. I have to leave something for you to do because this is a volunteer forum. If you need help figuring out how to do it, that's a different thing. Thanks!

Adrian

unread,
Apr 29, 2021, 5:19:41 PM4/29/21
to Web Scraping
I'm able to play around with variables and check the last but it needs to scroll down enough to reveal over 4000 rows and fetches 20 or so each time it scrolls to the bottom.

I've made it so it scrolls down until the last ticker is the same as the previous last ticker and then, on an Parser If, jump to extract data on another template but it seems to jump out of the scroll template (with the condition mentioned previously) at some point and after 95 or so scrolls (I guess it's so from the Data Run window on the app) It exits and gives empty result data. 

At the end it should extract the > 4000 items "in one go". But that, given the mathematics, may be out of scope for a free version, am I correct? 

Side note: I've noticed this thread is getting messy but don't know how to arrange it now.

Andrew11

unread,
Apr 29, 2021, 6:11:04 PM4/29/21
to Web Scraping
Sounds interesting, can you export the project and post it in a zip file? If the scraper has to open each of the 4000 items, that'll need the upgrade. If each scroll counts as a page fetch that's 200, which is right up against the limit. I think you can tell from the numbers on the Run page.

Premier Logik

unread,
May 2, 2021, 1:39:43 PM5/2/21
to Web Scraping
O.K,
Andrew thank you so much. The cut off issue is only in csv. The json show full description. Now I need help with pagination in the same project. If you can help me I would be pleased. 

Thanks you so much again for your help. I have attached the project files. Actually, I am not a developer that's why I need you help in this project. But moving forward with learning node.js with priple.com. If there any learning materials for data scraping please advise further. 

Regards,

SAEED

sa.indeed.com_Project.zip

Andrew11

unread,
May 2, 2021, 1:56:47 PM5/2/21
to Web Scraping
OK, here you go! There's so many pages though that I wonder if it'll run into trouble at some point. Let me know how it goes!
sa.indeed.com_Project.zip

Andrew11

unread,
May 3, 2021, 1:05:58 PM5/3/21
to Web Scraping
Hi again Saeed, I don't know of any books about ParseHub like you asked for above, but there's these online training modules now from the company.

On Sunday, May 2, 2021 at 10:39:43 AM UTC-7 saeed...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages