Hi Glenn,
This is interesting! When I went to the description example you provided, it was displaying in English for me. I also tried using the new public CSV export from the clipboard on your site, and when I looked at the item-level record, the culture listed was
en. I thought at first that the page served up might in part be determined by the default culture settings in a user's browser (our
accesstomemory.org website does this, for example) - but after checking with our team, it seems that AtoM is not performing any end-user language detection at present. So.... it's not that.
In terms of Google crawlers, I think the best short-term solution for you will be to add a robots.txt file to your AtoM instance with rules disallowing the culture URL extensions. If you're unfamiliar with the robots.txt protocol, here's some basic information:
You'll find many more resources like this with a simple web search.
You should be able to add a disallow rule for certain URL parameters - this thread should give you a starting place:
This is untested, but for this particular use case, I think the rule would be something along the lines of:
- Disallow: /*?sf_culture=*
You'll probably need to do a bit more reading and testing to make sure my best guess is accurate, but that should get you started in the right direction. Hopefully this way, Google and other crawlers that respect the protocol will stop indexing other languages. Keep in mind that some crawlers are jerks - there's nothing that compels them to respect a robots.txt directive - but most of the big ones like Google will.
Some other things that may help - I'm not sure, but it's worth a try:
First, in Admin > Settings > i18n Languages, you could remove all the other cultures that you are not actively using in your application. You'll want to re-index after any changes you make here. See:
If desired, you can even disable the Language menu completely if you are not using it, via the Default page elements settings. See:
Neither of these options will prevent someone from manually adding ?sf_culture=fr or another culture variable to the end of a URL. However, it may prevent more web crawlers (and users) from stumbling across these pages?
Finally, this is not directly related to your issue, but it might still be of interest - did you know that AtoM has a CLI task to help improve search engine optimization? See:
If you make the changes above and add a robots.txt file to your site, then you might consider running this task using the --ping option. Essentially, this option will alert Google and Bing to the new sitemap files and ask for your site to be reindexed. Hopefully this will replace the current results that you are seeing, and those crawlers should respect any directives found in your robots file.
There are ways we could improve this in the application itself, to avoid this kind of issue - for example, adding some kind of custom filter in the application that ignores ?sf_culture= parameter in certain circumstances - however, such a solution will require development. If UTAS is interested in pursuing this option, please feel free to contact me off-list, and I can coordinate with our developers to prepare some estimates for you.
Cheers,