How to scrap Wix websites

2,233 views
Skip to first unread message

George Suen Costanza

unread,
Jan 17, 2015, 11:16:31 PM1/17/15
to web-s...@googlegroups.com
Has anyone had success in scrapping a Wix website?
If so, please share some tips.

Thanks.

Mike Milner

unread,
Jan 19, 2015, 5:06:45 AM1/19/15
to web-s...@googlegroups.com
I would like to know as well please.

Mārtiņš Balodis

unread,
Jan 20, 2015, 4:10:23 AM1/20/15
to Mike Milner, web-scraper
Hi,
Try watching the video tutorials and do the same with the Wix site. If you have problems with it then post an exported sitemap here.

--
You received this message because you are subscribed to the Google Groups "Web Scraper" group.
To unsubscribe from this group and stop receiving emails from it, send an email to web-scraper...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Milner

unread,
Jan 20, 2015, 4:55:12 AM1/20/15
to web-s...@googlegroups.com, popu...@gmail.com
I have watched all your videos many times, I use the webscrapper on a daily basis, but always fail to scrap Wix websites.
For example this one: http://www.atelierdeparis.com.hk/
I used this exported sitemap:
{"selectors":[{"parentSelectors":["_root"],"type":"SelectorLink","multiple":true,"id":"cat","selector":"a.s1repeaterButton","delay":""},{"parentSelectors":["cat"],"type":"SelectorLink","multiple":true,"id":"items","selector":"div.s55_galleryDisplayer a","delay":""},{"parentSelectors":["items"],"type":"SelectorText","multiple":false,"id":"title","selector":"p.font_6","regex":"","delay":""},{"parentSelectors":["items"],"type":"SelectorText","multiple":false,"id":"price","selector":"p.font_4 strong","regex":"","delay":""},{"parentSelectors":["items"],"type":"SelectorText","multiple":false,"id":"descr","selector":"div#appPartZoomCompprd1appPart1zvc_ZoomGalleryLeft_ProductBundle__0_0_overview.s5 span","regex":"","delay":""},{"parentSelectors":["items"],"type":"SelectorHTML","multiple":false,"id":"size","selector":"div.s68","regex":"","delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"img1","selector":"div#ZoomGalleryLeft_wixImage__0_0_0 img.s10imgimage","downloadImage":false,"delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"img2","selector":"div#ZoomGalleryLeft_wixImage__0_0_1 img.s10imgimage","downloadImage":false,"delay":""},{"parentSelectors":["items"],"type":"SelectorHTML","multiple":false,"id":"color","selector":"div.s68","regex":"","delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"img3","selector":"div#ZoomGalleryLeft_wixImage__0_0_2 img.s10imgimage","downloadImage":false,"delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"img4","selector":"div#ZoomGalleryLeft_wixImage__0_0_3 img.s10imgimage","downloadImage":false,"delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"img5","selector":"div#ZoomGalleryLeft_wixImage__0_0_4 img.s10imgimage","downloadImage":false,"delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"im6","selector":"div#ZoomGalleryLeft_wixImage__0_0_5 img.s10imgimage","downloadImage":false,"delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"img7","selector":"div#ZoomGalleryLeft_wixImage__0_0_6 img.s10imgimage","downloadImage":false,"delay":""},{"parentSelectors":["items"],"type":"SelectorImage","multiple":false,"id":"img8","selector":"div#ZoomGalleryLeft_wixImage__0_0_7 img.s10imgimage","downloadImage":false,"delay":""}],"startUrl":"http://www.atelierdeparis.com.hk/","_id":"atelierparis"}

Mārtiņš Balodis

unread,
Jan 22, 2015, 3:50:00 AM1/22/15
to Mike Milner, web-scraper
Hi,
I am sorry but this page cannot be scraped with web scraper right now. The website is completely ajax based and uses hash tags for navigation. Link selector ignores these hash tags and that's why it isn't navigating the site. I added a feature request for this.

Mike Milner

unread,
Jan 22, 2015, 3:54:43 AM1/22/15
to web-s...@googlegroups.com, popu...@gmail.com
Understood, thank you Mārtiņš.
Reply all
Reply to author
Forward
0 new messages