Hi Eric,
Thanks for pointing us to the link. It's always good to see new
research popping up.
Re: which means that Googlebot now skips any (X)HTML content inside
the SWFObject divs and instead indexes the Flash.
Incorrect, it doesn't skip the HTML content. It indexes both your HTML
and Flash content, and makes a decision which content it will show to
a certain visitor as a search result.
The main question is: which are the variables in this decision
process? Does it take the visibility of content into account? E.g. if
you only show a Flash video with no textual content Googlebot will
probably index nothing, however if you provide descriptive alternative
content will it show these results instead? Also does the type of user
agent that makes a search request influence the process? E.g. Google
would ideally only like to show search results based on what someone
with particular a user agent can see, so a text browser should render
different results than Firefox with Flash installed, and in case of
dynamic publishing with JavaScript enabled. But you can already see
the complexity here, how does Google know that you have the required
version of Flash Player installed or JavaScript enabled? And what
about the difference between static and dynamic publishing? Brian
Ussery gives some good new insight in how Google deals with web
content today, however his first conclusion is certainly not that
conclusive at all. There is a lot more research that needs to be done
to describe Google's internal logic.
And I agree with you, that Google indexes Flash content is a great
first next step, however I expect many more to follow. Search engines
are far more optimized to crawl HTML content, so in certain ways this
is a case of two steps forward, one step back. Also, HTML has more
descriptive qualities than Flash, and it also contains hierarchy and
semantics, so it will be interesting to see how Flash vs HTML indexing
will evolve into the future. But in favor of Google don't forget that
the meat of the Flash content on the Web today does not use any
fallback content at all, even not for people without the proper
technology support, so at least Google will now index those websites.
That brings me to the next point. SEO is a very dynamic topic.
Especially when it comes to the web authoring part of SEO, I
personally have the opinion that SEO guidelines should NEVER conflict
with good web authoring practices. So when at moment X search engine
Y's results can be pimped by using H1 elements in a web page only
(which is obviously a bad web authoring practice), you should never
apply these guidelines, because they are likely not to do a website
any good in the near or long future.
> My biggest worry is this: Almost all the Flash sites I build are
> dynamically-driven by necessity. This makes the content in them really
> difficult for Googlebot to index (Googlebot reads external data as
> separate pages, has trouble following links, and so on). Whereas
> before I could use SWFObject to deliver a (X)HTML version of the
> content to Googlebot (for SEO) and Flash content to users with Flash
> (for visual and user-impact), it seems I can't do that anymore.
Please keep in mind that SWFObject is a best practice web authoring
technique for embedding SWF content and NOT an SEO optimization tool.
The main purpose for using alternative content is to display
descriptive content to people or software with insufficient technology
support. And yes, this also enables you to create search engine-
friendly content, however maybe to a different target audience than
you had in mind.
> So based on all this, I've got two questions:
> 1) Is there a way in SWFObject to FORCE an (X)HTML output if SWFObject
> detects the visitor is a robot?
You should never abuse SWFObject for this purpose. Google respects
mechanisms like robots.txt, so maybe you could use that for your
purpose.
> 2) What are other tactics people are using in order to optimize
> dynamically-driven Flash sites for Google?
Web authoring part is only a small part of SEO, really. The content
you supply, linkage from and to that content, and many more factors
(e.g. update intervals, content and linkage built over time)
eventually will determine how well a website ranks. For the web
authoring part I can only advise to use good web authoring practices,
don't rely on tricks, follow Google's web authoring guidelines and use
your common sense.