.htaccess for SEO bots crawling single page applications WITHOUT hashbangs

1,106 views
Skip to first unread message

Daniel Kanze

unread,
Jul 29, 2013, 11:55:04 AM7/29/13
to ang...@googlegroups.com
Using a `pushState` enabled page, normally you redirect SEO bots using the `escaped_fragment` convention. You can read more about that **[here][1]**.

The convention assumes that you will be using a (!#) hashbang prefix before all of your URI's on a single page application. SEO bots will escape these fragments by replacing the hashbang with it's own recognizable convention escaped_fragment when making a page request.

    //Your page
    http://example.com/!#home

    //Requested by bots as
    http://example.com/?_escaped_fragment=home

This allows the site administrator to detect bots, and redirect them to a cached prerendered page.

    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
    RewriteRule ^(.*)$  https://s3.amazonaws.com/mybucket/$1 [P,QSA,L]

The problem is that the hashbang is getting phased out quickly with the widely adapted pushState support. It's also really ugly and isn't very intuitive to a user.

So what if we used HTML5 mode where pushState guides the *entire* user application?

    //Your index is using pushState
    http://example.com/

    //Your category is using pushState (not a folder)
    http://example.com/category

    //Your category/subcategory is using pushState
    http://example.com/category/subcategory

Can rewrite rules guide bots to your cached version using this newer convention? [Related but only accounts for index edge case.][2] Google also **[has an article][3]** that suggests using an *opt-in* method for this single edge case using <meta name="fragment" content="!"> in the <head> of the page. Again, this is for a single edge case. Here we are talking about handling every page as an *opt-in* senario.

    http://example.com/?escaped_fragment=
    http://example.com/category?escaped_fragment=
    http://example.com/category/subcategory?escaped_fragment=

I'm thinking that the `escaped_fragment` could still be used as an identifier for SEO bots, and that I could extract everything inbetween the the domain and this identifier to append to my bucket location like:

    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=$
    # (high level example I have no idea how to do this)
    # extract "category/subcategory" == $2
    # from http://example.com/category/subcategory?escaped_fragment=
    RewriteRule ^(.*)$  https://s3.amazonaws.com/mybucket/$2 [P,QSA,L]

What's the best way to handle this?

  [1]: https://developers.google.com/webmasters/ajax-crawling/docs/specification
  [2]: http://stackoverflow.com/questions/17108931/how-to-do-a-specific-condition-for-escaped-fragment-with-rewrite-rule-in-htacce
  [3]: https://developers.google.com/webmasters/ajax-crawling/docs/specification


The question is up for internets here:

http://stackoverflow.com/questions/17926219/htaccess-for-seo-bots-crawling-single-page-applications-without-hashbangs

Sam Ward

unread,
Apr 7, 2014, 2:13:45 PM4/7/14
to ang...@googlegroups.com
Having the EXACT same problem...I was thinking about figuring out how to do it with .htaccess, as you have done here. What did you end up doing?

Sam Ward

unread,
Apr 7, 2014, 4:54:38 PM4/7/14
to ang...@googlegroups.com
For now, I'm using .htaccess like this:

    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
    RewriteRule ^$ /snapshots/index.html? [L,NC]
    RewriteCond %{QUERY_STRING} ^_escaped_fragment_=(.*)$
    RewriteRule ^(.*)$ /snapshots/$1.html? [L,NC]

Not sure if there's a better solution, but it's working for me so far. Just be sure to have the directory structure for your snapshots match the URL structure.


On Monday, July 29, 2013 11:55:04 AM UTC-4, Daniel Kanze wrote:
Reply all
Reply to author
Forward
0 new messages