DSpace 7 indexing by Google Scholar fails ...

224 views
Skip to first unread message

Michael Koch

unread,
Feb 5, 2025, 4:41:48 PM2/5/25
to DSpace Technical Support
Since moving to DSpace 7 (and now 8) I have major problems with Google Scholar (not) indexing our Items. 
When I check the Google search console, I find the item-Pages indexed -
with all the Highwire Meta Data in them.
So far, so fine ...
But Google Scholar does not know the papers.
Checking the Inclusion Guidelines for Webmasters I found that for inclusion in Scholar, the PDF file must be accessible and include the correct year, title etc.
... and that the file has to have a name with .pdf at the end.
... so no .pdf at the end ...
Could this be the problem?
Any other suggestions?

Michael

Edmund Balnaves

unread,
Feb 5, 2025, 4:53:30 PM2/5/25
to DSpace Technical Support
We have had problems with the indexing also.   There are several aspects - the slow rendering of the pages means sometimes the Google crawling is not completed.  The SEO metadata is not brilliant.   As a workaround we have had some success by substituting a light-weight SEO-enhanced metadata page to be picked up by bots ( the proxy detects a bot such as google and substitutes the lightweight page).  This has dramatically improved the web performance of the site (by taking of the bot load) and also significantly increased the indexing success of items. 

If you set up Google Search Console you should be able to see how well google is indexing your site and that will be something of a guideline of how well Scholar can pick up your site (indicative, as Scholar does it's own thing).

Adding the schema.org SEO and additional metadata helps with both dc metadata headers, file references and the  schema.org headers - eg

<script type="application/ld+json"> { "@context": "https://schema.org/", "@type": "Publication", "name": "General Practice Sleep Scale - The &quot;GPSS&quot; - A proposed new tool for use in General Practice for risk assessment of Obstructive Sleep Apnoea.", "author": { "@type": "Person", "name": "Howarth, Timothy" }, "datePublished": " 2024-11-21", "description": "", "prepTime": "PT1M" } </script>

Edmund Balnaves
Prosentient Systems

hb wooley

unread,
Feb 6, 2025, 9:47:00 AM2/6/25
to DSpace Technical Support
Edmund Balnaves, can you give more detail on how you are applying this workaround?

Thank you in advance.

DSpace Technical Support

unread,
Feb 10, 2025, 12:08:21 PM2/10/25
to DSpace Technical Support
All,

Just a few quick notes. 

As of 8.1 and 7.6.3, we've applied some major improvements to how Server-Side Rendering (SSR) is processed in the User Interface.  We *believe* these SSR improvements should help with bot traffic, as the goal of these improvements are to *limit* which pages are accessible to bots.  Most bots cannot process Javascript, which means that they should not be able to access pages which do not undergo SSR.  This means that upgrading to 8.1 or 7.6.3 should help with indexing speed, as the "good bots" (like Google Scholar) will still be able to access the data they need to index via your Sitemap, while the "bad bots" will no longer be able to crawl the entire site (especially the search/browse interfaces)

Second, I wanted to note that we've worked directly with the Google Scholar team to develop recommendations for how to get the best Search Engine Optimization out of DSpace.  Those guidelines are in our documentation: https://wiki.lyrasis.org/display/DSDOC8x/Search+Engine+Optimization   Make sure to follow those guidelines, as they will help ensure your site is fully indexed.   I regularly receive feedback from the Google Scholar team about indexing DSpace, and we continually improve things in each release based on that feedback.  So, one of the most important things you can do is to stay up-to-date on the latest version of DSpace.

Tim

Reply all
Reply to author
Forward
0 new messages