Ajax website crawling

18 views
Skip to first unread message

Saurabh Kumar

unread,
Nov 25, 2015, 8:16:21 AM11/25/15
to SOFTplus GSiteCrawler
Hello,
I work for a website which loads content using ajax. while google has crawled 3000+ pages of this, gstitecrawler is unable to go beyond the homepage.
Is there a way to get gsitecrawler to crawl such content?


webado

unread,
Nov 25, 2015, 8:22:52 AM11/25/15
to SOFTplus GSiteCrawler
Unfortunately there's no way.

Chris Wright

unread,
Nov 25, 2015, 3:15:24 PM11/25/15
to gsitec...@googlegroups.com
Here's a handy dandy Google Developers article on making ajax based websites 'crawlable'




Pretty heavy reading and can be difficult to implement on a full grown website unless you run some kind of CMS that can be easily modified to output the desired data for search engines.

Now I can't say for certain that gsitecrawler will be able to make use of the 'new' alternate content, but there are other scrapers/tools out there which can.

The main reason for posting the above is that if gsitecrawler can't crawl your content, there's a good chance most search engines can't either.

Regards

Chris

--
You received this message because you are subscribed to the Google Groups "SOFTplus GSiteCrawler" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsitecrawler...@googlegroups.com.
To post to this group, send email to gsitec...@googlegroups.com.
Visit this group at http://groups.google.com/group/gsitecrawler.
For more options, visit https://groups.google.com/d/optout.

webado

unread,
Nov 25, 2015, 3:22:16 PM11/25/15
to gsitec...@googlegroups.com
Google can execute javascript but gsc cannot.
--
Sent from Gmail Mobile

Chris Wright

unread,
Nov 25, 2015, 4:00:51 PM11/25/15
to gsitec...@googlegroups.com
On 25/11/2015 20:22, webado wrote:
> Google can execute javascript but gsc cannot.

Which is why the recommended approach was* to create a workaround by
having your server present a quasi-static version to crawlers & search
engines that doesn't require the use of Java (or dynamically created
content).

Google called it an HTML Snapshot.

*Although as of October 2015 Google no longer propose the AJAX crawling
method for getting your content into Googles SERP, it is a process which
can help some crawlers obtain a snapshot of your dynamically generated
content.
So long as you don't block Google's servers from accessing your JS and
CSS, they can now grab a copy of your dynamic content and don't need the
HTML Snapshot work around themselves.

But, I'm digressing from the OP's original question because I don't
believe gSc can access the HTML snapshots so regardless, it won't see
the generated HTML or the dynamic content...
I should have kept quiet ;) doh...

Chris


webado

unread,
Nov 25, 2015, 6:54:45 PM11/25/15
to gsitec...@googlegroups.com
Lol


On Wednesday, November 25, 2015, Chris Wright <chris.a...@gmail.com> wrote:
--
You received this message because you are subscribed to the Google Groups "SOFTplus GSiteCrawler" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gsitecrawler...@googlegroups.com.
To post to this group, send email to gsitec...@googlegroups.com.
Visit this group at http://groups.google.com/group/gsitecrawler.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages