All,
We discussed this topic in
yesterday's Developers Meeting, and realized that the data within the <script> tag is the *cache* (application state) being passed from the server-side to the client-side. This is how Angular applications pass this cached data.
However, it *is possible* to turn this cache off within DSpace by setting this in your config.*.yml:
ssr:
transferState: false
This setting will turn off the passing of REST queries in that cache, so that the <script> tag will no longer send all the REST queries previously run. This *might* fix the Google Scholar issues (assuming that Google Scholar is finding these bitstreams via that <script> tag), but it does have a side effect.
Obviously, if you turn off this cache, then you may see a small number of *duplicate* REST API queries when your site switches from SSR to CSR on the first page visit. Essentially, during the SSR, the server-side code is making calls to the REST API to build the HTML. But, after switching to CSR, some of those same REST API calls may be made by the page in the users browser.
However, I'm told by others who use this "transferState: false" setting in production that the performance impacts are minimal simply because once you switch to CSR you'll stay in CSR. So, these potentially duplicate queries may only occur on the first page you visit in the site.
A new ticket has been created to help describe this problem in more detail:
https://github.com/DSpace/DSpace/issues/11871 (And I've added these same notes to a comment on that ticket.)
While disabling this cache is a potential "quick fix", the more correct solution is likely to access restrict these bundles/bitstreams altogether (as they are only needed within DSpace). So the other ticket
https://github.com/DSpace/DSpace/issues/11681 is still a more valid solution for the long term.
I'm hoping to find a volunteer to start investigating default access restrictions on these bundles.
If you have other questions, feel free to ask them here or in the tickets.
Tim