ParseHub cannot get images from carousel

464 views
Skip to first unread message

Leonard Lie

unread,
Oct 17, 2020, 6:31:26 PM10/17/20
to Web Scraping
Hello guys,

I'm trying to extract all the images in the carousel on this link :
https://www.kawai-global.com/product/cn29/
when you go to the link, there's a gallery button you can click 
which will open up a popup showing the image carousel.
for some reason parsehub is not getting all the images

tried the guide from parsehub 
i use 'select gallery button' and 'click gallery button item'
and make a new template. 
then i select the first image. turn on browse mode,
click next, and turn off browse mode. then i click on the second image
the selected item doesn't become 2, but it comes back to one.

Any help anyone? Thank you.

Andrew11

unread,
Oct 18, 2020, 9:55:46 AM10/18/20
to Web Scraping
I've actually never had to save images from ParseHub, but if it's just a matter of Select-ing the carousel images, you can try using the CSS:

.slides > li[data-thumb-alt] > picture img

Rombout Versluijs

unread,
Nov 26, 2020, 6:20:46 PM11/26/20
to Web Scraping
@andrew11

It doesnt seem to work that easy as their example. Ive been busy for a couple hourse now. First issue i run into is i have a carousel where i can select the image due to extra div inside the active div. When selecting image according to their example, it will always select de class "imag-active". 

My second approach was using css.. I got the class ".gallery__image" and i do see the proper amount images (24). When i then use select for the attribute, it selects the same single image?! Seem like a bug to me. 
Most time you can select an image in a carousel due to those arrows, they lay ontop of the image. I dont understand how they can make that example work

this is that help article pointed out earlier. I tried both method, still cant get it to work
https://help.parsehub.com/hc/en-us/articles/360007553274-Scraping-all-images-from-an-image-carousel

Andrew11

unread,
Nov 26, 2020, 6:42:27 PM11/26/20
to Web Scraping
I only see 3 images in the carousel that opens when you click on Gallery (this is when looking at the page with a regular web browser). All those extras are probably different sized versions of the same jpeg for showing on different size screens. If you Edit the Selection Node (under where it says Get Data) and switch it to CSS, I think this selects the 3 different images when you paste it in the Selector:

.slides > li[data-thumb-alt] > picture img

Andrew11

unread,
Nov 26, 2020, 6:44:08 PM11/26/20
to Web Scraping
If you mean a different website than Kawai, go ahead and post the link. Thanks!

On Thursday, November 26, 2020 at 3:20:46 PM UTC-8 romb...@gmail.com wrote:

Rombout Versluijs

unread,
Nov 26, 2020, 11:30:34 PM11/26/20
to Web Scraping
@andrew11

Thanks for the fast reply!!!
This is the website I was talking about. https://www.probo.nl/beurswanden

I think I got something working now after fiddling a couple hours more. If you check the carousel on the tip of the page you will notice you can't select the image. I tried using classes but wasn't able to get it to work. I think I somehow missed the point that you need to run a second select inside the first select to be able to grave them all. That is kind of fuzzy in there documentation, they use all really simple examples.

b...@parsehub.com

unread,
Nov 27, 2020, 11:00:54 AM11/27/20
to Web Scraping
Hello,
You do not need a selection within another selection to scrape these images. You can simply use this custom css selector, followed by a begin new entry and extract command (see attached). The begin new entry command is required to extract each image on a different row. Without it, each entry will overwrite the previous and the output will simply be the URL of the last image. 
Begin new entry commands are created automatically in Parsehub when you make a select command selecting multiple elements, but they are not automatically created when you make custom selectors, so they need to be added manually.
Let me know if you have any questions about this.

Screen Shot 2020-11-27 at 10.58.07 AM.png

Rombout Versluijs

unread,
Nov 30, 2020, 4:39:04 PM11/30/20
to Web Scraping
Hi thanks for looking into it

I also tried that using the thumbnails and then use a replace command to get the higher res images. This would work if you scrape a single page. But i need to scrape lots of pages and this code changes.

Therefor im using the class names to scrape the data. Something weird happens to. When use the properclass name i can see it find say 24 images, but when scrapping they are all the same. So i used an extra loop. i think i did that. First i select the main container and then use the class used by the div which have a background image. I also then grab the hi res retina image

Thanks again for trying!!!

Something else which is weird, is that even i renamed the select function. It keeps going back to the default name "select", kinda weird?!
Screen Shot 2020-11-30 at 17.33.43.png

Andrew11

unread,
Dec 1, 2020, 9:18:09 AM12/1/20
to Web Scraping
Go ahead and post the .phj file that you get from Settings -> Export project. But I think Ben is right about the Begin new entry command. You put it so it's indented under Select Images, and then the extract commands should be indented under that.

Rombout Versluijs

unread,
Dec 1, 2020, 4:12:08 PM12/1/20
to Web Scraping
The problem is that i cant use it on that main div. In almost 95% of the cases when selecting a carousel main image, you select the left and right navigation buttons. Thats how most are setup. In my case, you notice there is a small div in the top left corner, i cant select the image and use the select method. Thats why i reverted to CSS selecting style.

Using the CSS as select method i dont get that "New entry command", that only seems to show when use Xpath method and i can do that in this case.

Okay after checking again, basically my work around is the same as using css selector with begin new entry. It does same thing. See below screengrab, using that "new entry command" i get the style as what i do in just 1 step.

Screen Shot 2020-12-01 at 17.10.23.png

Andrew11

unread,
Dec 1, 2020, 5:33:19 PM12/1/20
to Web Scraping
Maybe something different is going on. I know the plus button doesn't always appear when it should, but you can get around that by using it on a different command where it does show, and then dragging or CTRL-X / CTRL-V copy pasting it where it should be. If you post that .phj file I'll see if I can find out what's going wrong.

Rombout Versluijs

unread,
Dec 2, 2020, 11:55:59 AM12/2/20
to Web Scraping
Thanks andrew11!

Yeah i thought about copy/paste but then it still would have the wrong link. I cant select the image since it uses a div with the image set as a background image inside a div which has the active class. I guess thats bugging a bit. See attached screengrab. Notice the div with the image is inside a another div which has the active state.

Ive also attached the project file. If you open it, click any product on that overview page. Then click the 2nd template called Probo project page 02"

Screen Shot 2020-12-02 at 12.51.28.png

Probo_Products.phj.zip

Rombout Versluijs

unread,
Dec 2, 2020, 11:56:48 AM12/2/20
to Web Scraping
PS posting project files isnt that smooth, google doesnt like it. When i used ZIP i can post, otherwise it keeps returning error.

Anyways, the project file is attached

Andrew11

unread,
Dec 2, 2020, 12:45:11 PM12/2/20
to Web Scraping
Do you mean how the left columns go blank when it's time to extract the rightmost image/retina columns? You can fix that by making the two Begin new entry's point to the same list, which I called "details" in the edited file. Let me know if you have a different problem though. -A
Probo_Products2.zip

Andrew11

unread,
Dec 2, 2020, 12:57:28 PM12/2/20
to Web Scraping
Is it extracting the wrong links when you compare the jpegs you download to what shows on the browser's page?

On Wednesday, December 2, 2020 at 8:56:48 AM UTC-8 romb...@gmail.com wrote:

Rombout Versluijs

unread,
Dec 2, 2020, 3:04:28 PM12/2/20
to Web Scraping
Sorry, perhaps i wasnt clear in my answer. My approach now works after many trials and errors.
What i was trying to point out is, the example in the documents uses a different approach which most of the time wont work.
Using the Xpath method i cant select the image, its select the DIV prior to the div contain the background image.
The other trick using the thumbnails and then replace the path will also not work. Thats because that path will be different for each page, thus it will only work per page and you would need adjust the replace action per page.

PS my setup for the data is different, im not using a per entry. Thats because i need to copy paste the data over to HTML pages. Since many pages have different lenghts of specs i though this approach would be handier for the copy/paste action i need to do. This way also the returns stay intact and i can more easily add attributes in using Visual Studio Code and Emmet

Your method also extracts the images same as like my method. I did found the "begin new entry" command, but it didnt made a difference in the output
Thanks for trying as well

Rombout Versluijs

unread,
Dec 2, 2020, 3:07:38 PM12/2/20
to Web Scraping
I think it will be useful for others if this example using CSS to extract images in a carousel would be added.
its this page, im not sure you have any connection to ParseHUb. I thought because you seem to have good knowledge of the app.
https://help.parsehub.com/hc/en-us/articles/360007553274-Scraping-all-images-from-an-image-carousel

Both of these examples will not work in the carousel of the website i was trying to scrape

Andrew11

unread,
Dec 2, 2020, 3:37:51 PM12/2/20
to Web Scraping
Oh, I get it. No worries, I'm not a ParseHub rep but Ben is and he gets cc'd on all these emails. Good luck with future scrapes!

Rombout Versluijs

unread,
Dec 2, 2020, 7:40:07 PM12/2/20
to Web Scraping
Thanks again Andrew!!!
Much appreciated. 

Reply all
Reply to author
Forward
0 new messages