SWFObject and SEO

42 views
Skip to first unread message

TheCosmonaut

unread,
Oct 25, 2008, 3:38:53 PM10/25/08
to SWFObject
I just read Brian Ussery's posts (http://www.beussery.com/blog/
index.php/tags/swfobject/) on Googlebot and Flash. It was news to me
that Googlebot is now traversing SWFObject, which means that Googlebot
now skips any (X)HTML content inside the SWFObject divs and instead
indexes the Flash. While in theory, this is great (meaning that Google
can start indexing Flash), the reality is that Googlebot still has a
long way to go to correctly index Flash files, and the fact that
Googlebot is traversing SWFObject means that SEO for Flash has become
MORE difficult.

My biggest worry is this: Almost all the Flash sites I build are
dynamically-driven by necessity. This makes the content in them really
difficult for Googlebot to index (Googlebot reads external data as
separate pages, has trouble following links, and so on). Whereas
before I could use SWFObject to deliver a (X)HTML version of the
content to Googlebot (for SEO) and Flash content to users with Flash
(for visual and user-impact), it seems I can't do that anymore.

So based on all this, I've got two questions:
1) Is there a way in SWFObject to FORCE an (X)HTML output if SWFObject
detects the visitor is a robot?
2) What are other tactics people are using in order to optimize
dynamically-driven Flash sites for Google?

Thanks!

--eric

Bobby

unread,
Oct 26, 2008, 5:55:20 AM10/26/08
to SWFObject
Hi Eric,

Thanks for pointing us to the link. It's always good to see new
research popping up.

Re: which means that Googlebot now skips any (X)HTML content inside
the SWFObject divs and instead indexes the Flash.

Incorrect, it doesn't skip the HTML content. It indexes both your HTML
and Flash content, and makes a decision which content it will show to
a certain visitor as a search result.

The main question is: which are the variables in this decision
process? Does it take the visibility of content into account? E.g. if
you only show a Flash video with no textual content Googlebot will
probably index nothing, however if you provide descriptive alternative
content will it show these results instead? Also does the type of user
agent that makes a search request influence the process? E.g. Google
would ideally only like to show search results based on what someone
with particular a user agent can see, so a text browser should render
different results than Firefox with Flash installed, and in case of
dynamic publishing with JavaScript enabled. But you can already see
the complexity here, how does Google know that you have the required
version of Flash Player installed or JavaScript enabled? And what
about the difference between static and dynamic publishing? Brian
Ussery gives some good new insight in how Google deals with web
content today, however his first conclusion is certainly not that
conclusive at all. There is a lot more research that needs to be done
to describe Google's internal logic.

And I agree with you, that Google indexes Flash content is a great
first next step, however I expect many more to follow. Search engines
are far more optimized to crawl HTML content, so in certain ways this
is a case of two steps forward, one step back. Also, HTML has more
descriptive qualities than Flash, and it also contains hierarchy and
semantics, so it will be interesting to see how Flash vs HTML indexing
will evolve into the future. But in favor of Google don't forget that
the meat of the Flash content on the Web today does not use any
fallback content at all, even not for people without the proper
technology support, so at least Google will now index those websites.

That brings me to the next point. SEO is a very dynamic topic.
Especially when it comes to the web authoring part of SEO, I
personally have the opinion that SEO guidelines should NEVER conflict
with good web authoring practices. So when at moment X search engine
Y's results can be pimped by using H1 elements in a web page only
(which is obviously a bad web authoring practice), you should never
apply these guidelines, because they are likely not to do a website
any good in the near or long future.

> My biggest worry is this: Almost all the Flash sites I build are
> dynamically-driven by necessity. This makes the content in them really
> difficult for Googlebot to index (Googlebot reads external data as
> separate pages, has trouble following links, and so on). Whereas
> before I could use SWFObject to deliver a (X)HTML version of the
> content to Googlebot (for SEO) and Flash content to users with Flash
> (for visual and user-impact), it seems I can't do that anymore.

Please keep in mind that SWFObject is a best practice web authoring
technique for embedding SWF content and NOT an SEO optimization tool.
The main purpose for using alternative content is to display
descriptive content to people or software with insufficient technology
support. And yes, this also enables you to create search engine-
friendly content, however maybe to a different target audience than
you had in mind.

> So based on all this, I've got two questions:
> 1) Is there a way in SWFObject to FORCE an (X)HTML output if SWFObject
> detects the visitor is a robot?

You should never abuse SWFObject for this purpose. Google respects
mechanisms like robots.txt, so maybe you could use that for your
purpose.

> 2) What are other tactics people are using in order to optimize
> dynamically-driven Flash sites for Google?

Web authoring part is only a small part of SEO, really. The content
you supply, linkage from and to that content, and many more factors
(e.g. update intervals, content and linkage built over time)
eventually will determine how well a website ranks. For the web
authoring part I can only advise to use good web authoring practices,
don't rely on tricks, follow Google's web authoring guidelines and use
your common sense.

TheCosmonaut

unread,
Oct 26, 2008, 1:56:27 PM10/26/08
to SWFObject
Hello Bobby -

Thank you so much for such a thorough and well-thought-out response! I
was not aware that Google indexed both the (X)HTML and the Flash --
that definitely changes everything.

Also, I definitely take to heart your two main points:
1) SWFObject is a best practice web authoring technique for
embedding SWF content and NOT an SEO optimization tool.
2) Web authoring in general should be faithful to semantics (NOT to
trickery, abusing code/technologies, etc.)

I couldn't agree more with both! My only issue is this: Since Flash
has very little in the way of standardized descriptive qualities,
hierarchies, and semantics for search engines, it seems like the
burden falls upon us Flash developers to provide descriptions,
hierarchies, and semantic contexts that Googlebot can understand. At
this point, it seems to me like the only way to do THAT is to provide
a search-engine-friendly "index", "site map", or "mirror" of the Flash
content. It seems to me like a totally valid use of SWFObject is to
provide a mirror of the Flash content to users who can't correctly
interpret Flash. Since Googlebot at this point doesn't correctly
interpret Flash (and this is not a criticism of Google -- I can't
imagine the challenges they face in trying to index content that has
little to no visible structure for them), and if one thinks of
Googlebot as a "user", is it an abuse of SWFObject to serve content to
Googlebot that it CAN correctly interpret? This line of thinking leads
me back to the thought: maybe there should be a way to force SWFObject
to serve (X)HTML content to Googlebot.

I totally subscribe to the belief that our duty as web developers is
to serve content that is accurate (NOT deceiving) both structurally
and semantically. I firmly believe that one should provide content
that degrades accurately according to viewer capabilities, so that
anyone who interacts with your website will receive the same content,
whenever possible. Alternative content MUST match regular content, or
you're being deceitful. It just seems to me like Googlebot is using an
inadequate browser and should be served alternative content. I look
forward to the day when that's not the case, but it seems like it is
our current situation.

What are your thoughts?

Thanks so much!

--eric

beussery

unread,
Oct 26, 2008, 7:24:56 PM10/26/08
to SWFObject
Hey Bobby and thanks for pointing out this thread TheCosmonaut.
Thanks for checking out my post as well....

> It indexes both your HTML
> and Flash content, and makes a decision which content it will show to
> a certain visitor as a search result.

As mentioned in my research, Google isn't attributing content in Flash
with the parent URL or as a single entity. This is still true as you
can see using Google's own example query:
http://www.google.com/search?q=nasa+deep+impact+animation&sourceid=navclient-ff&ie=UTF-8&rlz=1B3GGGL_enUS278US278

In the results, notice www.jpl.nasa.gov/multimedia/deep-impact/index-flash.html
doesn't include "alternative" content and that
www.jpl.nasa.gov/multimedia/deep-impact/index.swf is also indexed.
When swf files are indexed they are accessible to users with or
without Flash. Try the same query on your iPhone and you'll see what
I mean. When users without Flash click on swf files in search results
no progressive enhancement takes place and graceful degredation can't
happen.

> if you only show a Flash video with no textual content Googlebot will
> probably index nothing, however if you provide descriptive alternative
> content will it show these results instead?

Actually the opposite seems to be true in most cases I've seen. My
case study and Google's example is a Flash file at a parent URL
without descriptive alternative text content or any text content in
(X)HTML indexed in search results. In fact text content in Flash has
been associated with the (X)HTML file.

>Also does the type of user
> agent that makes a search request influence the process? E.g. Google
> would ideally only like to show search results based on what someone
> with particular a user agent can see, so a text browser should render
> different results than Firefox with Flash installed, and in case of
> dynamic publishing with JavaScript enabled.

I'm not sure how user-agent Googlebot would know what a user using
user-agent Firefox sees. I'd be careful here because returning
different pages based on user agent could be considered cloaking by
user agent.

If Googlebot has JavaScript enabled and SWFObject works as designed,
I'm not sure how Googlebot would see text in (X)HTML. Can anyone
explain how this would happen?

>But you can already see
> the complexity here, how does Google know that you have the required
> version of Flash Player installed or JavaScript enabled? And what
> about the difference between static and dynamic publishing? Brian
> Ussery gives some good new insight in how Google deals with web
> content today, however his first conclusion is certainly not that
> conclusive at all. There is a lot more research that needs to be done
> to describe Google's internal logic.

There is lots of research to be done, that is for sure! I didn't go
into lots of detail because as you point out, things can change at any
moment but here is some of what is known today.

Google is using Adobe's "Icabod" Flash player which like it's namesake
is headless. In other words, #anchors no longer work against
Googlebot seeing text content in Flash.

Googlebot supports SWFObject and as a result may not currently see
alternative text content in underlying (X)HTML. Obviously the point
of SWFObject is to return Flash to users with Flash and user-agents
supporting JavaScript, like the "new" Googlebot. Bobby mentioned H1
abuse (spam) by some Flash sites and I've seen a decline in Flash
rankings for sites using this technique to manipulate rankings. This
would also indicate Googlebot may no longer "see" H1s since SWFObject
support was introduced in July.

As far as dynamic content via xml, Googlebot now sees text content in
"the Flash file" but, not dynamic content imported into the Flash file
from another source. My theory is that Icabod may not yet support
text content from another source.

When it comes to meta data, links and other signals used by search
engines these can be optimized even in Flash by using simple steps
like avoiding "seamless transitions" in text rich sections of a site.

I hope this helps shed light on my research and welcome any feedback
or questions...

-Brian












beussery

unread,
Oct 26, 2008, 9:48:42 PM10/26/08
to SWFObject
One note, when I said "In other words, #anchors no longer work against
Googlebot seeing text content in Flash."

That should be taken to mean Googlebot doesn't still ignore #anchors
it finds in URLs not inside of a Flash file.



On Oct 26, 7:24 pm, beussery <beuss...@gmail.com> wrote:
> Hey Bobby and thanks for pointing out this thread TheCosmonaut.
> Thanks for checking out my post as well....
>
> > It indexes both your HTML
> > and Flash content, and makes a decision which content it will show to
> > a certain visitor as a search result.
>
> As mentioned in my research, Google isn't attributing content in Flash
> with the parent URL or as a single entity.  This is still true as you
> can see using Google's own example query:http://www.google.com/search?q=nasa+deep+impact+animation&sourceid=na...
>
> In the results, noticewww.jpl.nasa.gov/multimedia/deep-impact/index-flash.html
> doesn't include "alternative" content and thatwww.jpl.nasa.gov/multimedia/deep-impact/index.swfis also indexed.

Bobby

unread,
Oct 27, 2008, 9:07:13 AM10/27/08
to SWFObject
Hey Brian, good work on the research!

> > It indexes both your HTML
> > and Flash content, and makes a decision which content it will show to
> > a certain visitor as a search result.
>
> As mentioned in my research, Google isn't attributing content in Flash
> with the parent URL or as a single entity.  This is still true as you
> doesn't include "alternative" content and thatwww.jpl.nasa.gov/multimedia/deep-impact/index.swfis also indexed.
> When swf files are indexed they are accessible to users with or
> without Flash.  Try the same query on your iPhone and you'll see what
> I mean.  When users without Flash click on swf files in search results
> no progressive enhancement takes place and graceful degredation can't
> happen.

This is what I see when I type in the query (both the HTML page and
SWF indexing results are displayed as 2 seperate search results,
however clustered together, the SWF hierarchical under the HTML page):
http://www.bobbyvandersluis.com/swfobject/img/google_deepimpact_20081027.gif

If you just take a look at the source code of the HTML page, there is
no alternative content included, so this might not be the best test
case after all :-(

When you read the QA on googlewebmastercentral:
http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-indexing.html

The 3 quotes under "Interaction of HTML pages and Flash" give a good
indication of Google's new direction:

"The text found in Flash files is treated similarly to text found in
other files, such as HTML, PDFs, etc. If the Flash file is embedded in
HTML (as many of the Flash files we find are), its content is
associated with the parent URL and indexed as single entity."

"Serving the same content in Flash and an alternate HTML version could
cause us to find duplicate content. This won't cause a penalty -- we
don’t lower a site in ranking because of duplicate content. Be aware,
though, that search results will most likely only show one version,
not both."

"We’re trying to serve users the most relevant results possible
regardless of the file type. This means that standalone Flash, HTML
with embedded Flash, HTML only, PDFs, etc., can all have the potential
to be returned in search results."

> > if you only show a Flash video with no textual content Googlebot will
> > probably index nothing, however if you provide descriptive alternative
> > content will it show these results instead?
>
> Actually the opposite seems to be true in most cases I've seen.  My
> case study and Google's example is a Flash file at a parent URL
> without descriptive alternative text content or any text content in
> (X)HTML indexed in search results.  In fact text content in Flash has
> been associated with the (X)HTML file.

That's not what I meant: if you serve a web page with 1 swf file that
only shows 1 flv file and no textual content within the swf, and you
serve descriptive alternative content that describes what can be seen
in the video, I doubt that Google will show the blank results of the
crawled swf.

But then again, these many different scenarios are well worth
investigating. I just fear that if you would do elaborate research
now, in 6 months time the results will probably be way different, so
it needs to be studied over time...

An excerpt from your blog entry: http://www.beussery.com/blog/index.php/2008/10/google-flash-seo/

"While the full impact is not yet known, these technologies will
redefine how Flash sites are created, constructed, designed and, as a
result, optimized."

Yes and no.

IMO good web authoring techniques should work now and in 5 years from
now, while SEO techniques usually only stand the test of time when
they overlap with good web authoring techniques.

And both can influence each other. I mean, a few years ago only a
handful of people were promoting techniques like progressive
enhancement and the use of fallback content for plug-in content,
because they totally made sense from a web authoring point of view.
Only millions of authoring implementations (e.g. SWFObject, UFO) later
search engine vendors have picked up this trend. And the current
developments with Flash indexing will of course impact how people
define content within Flash.

I hope that Adobe will play an active role in this too. If the SWF
format can include descriptive features, hierarchy and semantics for
its textual content and links, it will offer web authors the
possibility to optimize their content as can be one with HTML.

Slowly things are getting more mature, it's just a process :-)

beussery

unread,
Oct 27, 2008, 11:24:04 AM10/27/08
to SWFObject
Hey Bobby and thanks for your kind words!

> This is what I see when I type in the query (both the HTML page and
> SWF indexing results are displayed as 2 seperate search results,
> however clustered together, the SWF hierarchical under the HTML page):http://www.bobbyvandersluis.com/swfobject/img/google_deepimpact_20081...
>
> If you just take a look at the source code of the HTML page, there is
> no alternative content included, so this might not be the best test
> case after all :-(
>
> When you read the QA on googlewebmastercentral:http://googlewebmastercentral.blogspot.com/2008/06/improved-flash-ind...
>
> The 3 quotes under "Interaction of HTML pages and Flash" give a good
> indication of Google's new direction:
>
> "The text found in Flash files is treated similarly to text found in
> other files, such as HTML, PDFs, etc. If the Flash file is embedded in
> HTML (as many of the Flash files we find are), its content is
> associated with the parent URL and indexed as single entity."

You are correct, that is what Google says but it doesn't seem to be
what they are doing. That is why I'm sharing my research. As your
image clearly illustrates, Google isn't associating text content in
Flash with the correct parent URL or indexing both as a single entity.


> "Serving the same content in Flash and an alternate HTML version could
> cause us to find duplicate content. This won't cause a penalty -- we
> don’t lower a site in ranking because of duplicate content. Be aware,
> though, that search results will most likely only show one version,
> not both."

Exactly, serving the same content won't result in a penalty but may
result in one version being filtered from search results at Google's
discretion.


> IMO good web authoring techniques should work now and in 5 years from
> now, while SEO techniques usually only stand the test of time when
> they overlap with good web authoring techniques.

I couldn't agree more!

Bobby

unread,
Oct 27, 2008, 12:54:57 PM10/27/08
to SWFObject
Re: As your image clearly illustrates, Google isn't associating text
content in Flash with the correct parent URL or indexing both as a
single entity.

But the text "ORBIT PATHS This animation shows the trajectory of Deep
Impact and the orbit ..." as shown in both entries comes from the SWF
file. The problem with this example is that besides the page title
there is zero fallback content and regular indexable HTML content
available.

beussery

unread,
Oct 27, 2008, 9:54:06 PM10/27/08
to SWFObject
Sure, but they should both be indexed as a single entity according to
Google and not two as shown in your image. I've been watching this
cycle for months and unless something has changed since my research
was published, in a few days you can expect to see only the Flash file
indexed.

IE
http://www.google.com/search?q=deep+impact+amy+walsh&pws=0&hl=en&num=10

I see problems with this because users without Flash enabled can
access Flash files in SERPs. Does that make sense?


They were indexed as one (X)HTML prior to Google's support for
SWFObject being introduced but since then only the Flash and or both
have been indexed in SERPs but never the parent URL alone.

As for the example, I've seen this in a number of other cases and
sites with alternative (X)HTML as well but for obvious reasons can't
publish those examples. The example here is Google's example from
Google Webmaster Central Blog. While it may not be the best for this
discussion it seems fairly accurate of other sites I'm tracking.

Bobby

unread,
Oct 28, 2008, 8:06:08 AM10/28/08
to SWFObject
> Sure, but they should both be indexed as a single entity according to
> Google and not two as shown in your image.

They are in the first entry. Personally I don't mind the second one
being displayed as it is, however I doubt the use of linking directly
to SWF files. I mean, a lot of SWF files require input (e.g.
flashvars, JavaScript communication) with their hosting HTML page.
Also there is no correct sizing (original width and height). A SWF
file is simply often not used like a more static pice of plugin
content like a PDF file.

>  I've been watching this
> cycle for months and unless something has changed since my research
> was published, in a few days you can expect to see only the Flash file
> indexed.
>
> IEhttp://www.google.com/search?q=deep+impact+amy+walsh&pws=0&hl=en&num=10

Yeah, that's bad. And indeed different that what is stated on
googlewebmastercentral.

> I see problems with this because users without Flash enabled can
> access Flash files in SERPs.  Does that make sense?

No, not at all. Why show this to e.g. an iPhone user if he even can't
open the search result?

> They were indexed as one (X)HTML prior to Google's support for
> SWFObject being introduced but since then only the Flash and or both
> have been indexed in SERPs but never the parent URL alone.
>
> As for the example, I've seen this in a number of other cases and
> sites with alternative (X)HTML as well but for obvious reasons can't
> publish those examples.  The example here is Google's example from
> Google Webmaster Central Blog.  While it may not be the best for this
> discussion it seems fairly accurate of other sites I'm tracking.

That would be a bad direction indeed. Maybe work in progress. Please
send your feedback to the Google Webmaster Central team.

beussery

unread,
Oct 28, 2008, 2:29:32 PM10/28/08
to SWFObject
> They are in the first entry. Personally I don't mind the second one
> being displayed as it is, however I doubt the use of linking directly
> to SWF files. I mean, a lot of SWF files require input (e.g.
> flashvars, JavaScript communication) with their hosting HTML page.
> Also there is no correct sizing (original width and height). A SWF
> file is simply often not used like a more static pice of plugin
> content like a PDF file.

Totally, I don't "mind" both either but according to Google there
should be one entry. Just so you know, your image isn't the norm in
most cases only the time only the Flash is indexed without the
(X)HMTL.

> Why show this to e.g. an iPhone user if he even can't
> open the search result?

That is my point... try the query on an iphone.

> That would be a bad direction indeed. Maybe work in progress. Please
> send your feedback to the Google Webmaster Central team.

I think you're right, a work in progress for sure. Google's Flash
folks are looking into these issues and I'm happy to provide any
additional information they request. Hopefully they can be resolved
in short order.

Avangelist

unread,
Dec 11, 2008, 12:59:15 PM12/11/08
to SWFObject
Update:

I am sure it has already been stated but, the latest confirmations
from Google has been that their spider will check the site and is it
has no javascript or flash player as such it will read the html
content first. So use it to get your site indexed for key terms. Don't
forget that it is possible somebody will see that content so don't try
dirty black hat SEO as you could end up with an ugly page.

On Oct 28, 6:29 pm, beussery <beuss...@gmail.com> wrote:
> > They are in the first entry. Personally I don't mind the second one
> > beingdisplayedas it is, however I doubt the use of linking directly
> > to SWF files. I mean, a lot of SWF files require input (e.g.
> > flashvars, JavaScript communication) with their hosting HTML page.
> > Also there is no correct sizing (original width and height). A SWF
> > file is simply oftennotused like a more static pice of plugin

Geoff Stearns

unread,
Dec 11, 2008, 2:46:15 PM12/11/08
to swfo...@googlegroups.com
I am sure it has already been stated but, the latest confirmations
from Google has been that their spider will check the site and is it
has no javascript or flash player as such it will read the html
content first. So use it to get your site indexed for key terms. Don't
forget that it is possible somebody will see that content so don't try
dirty black hat SEO as you could end up with an ugly page.


Google reads swf files now, and can pick them up even if you use swfobject. (you should go back and read the first post in this thread).


Reply all
Reply to author
Forward
0 new messages