

Eunice,
A couple thoughts:
Jsessionid showing in the URL is, AFAIK, an indication that cookies are being blocked/not working correctly – they don’t show up at most Dataverse sites. There are some old community emails on this topic – for example, I saw a comment I made indicating that I’d seen this due to a load balancer in one thread. It may be worth trying to figure out/fix why jsessionid is appearing if you’re not configured to use it for some specific reason. (There is also a switch in the web.xml file that is commented out by default. If something in your configuration is blocking cookies, forcing this could break normal use though. Could also be worth a try).
In terms of indexing, I’m not sure what the best approach would be if you can’t eliminate the jssessionids. The Google URL Parameters tool guidance re: parameters in the “No: Doesn't affect page content” category seems like it would avoid duplicates – I think that would tell Google to only index with one jsessionid (if it doesn’t already – the image in your email doesn’t show any duplicates, so perhaps Google already understands it doesn’t need to track URLs with different jsessionids.). The URL rewrite methods could make sense but if you just drop all jsessionids to all users and cookies aren’t working, you’d probably be blocking logins as well. In any case, I don’t recall any info shared about this in the community while I’ve been around.
Re: sitemap.xml - just looking a little bit, it appears that the sitemap is available at both /sitemap.xml and /sitemap/sitemap.xml and the latter should be visible to robots via the
Allow /sitemap/
Line in the suggested robots.txt. Hopefully if you tell Google to retrieve /sitemap/sitemap.xml instead, it will work. That said, I’m not sure why Google shouldn’t read /sitemap.xml and now that I look, the guides say that other search engines may use that copy but the robot.txt example Disallows / and blocks it. So – it appears there’s some inconsistency in allowing other engines to access the sitemap that could/should be figured out. Others probably know more of the history here.
-- Jim
--
You received this message because you are subscribed to the Google Groups "Dataverse Users Community" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
dataverse-commu...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/dataverse-community/bc725128-4f91-4408-9b99-1b5d141fd1d0n%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/MN2PR07MB7343E9763EB82D25C397A2F6BFE69%40MN2PR07MB7343.namprd07.prod.outlook.com.
If you figure out a good practice with jsessionid, it would be great to get it back in the guide.
Re: sitemaps – I just added a comment on #8329 suggesting that we might want to add Allow: /sitemap.xml to the sample robots file (which already allows /sitemap/ ) that would cover your case below. It sounds like you can add a Sitemap: directive to robots.txt to help indexers find it too. In any case – if you find out more, please comment/link info there. (For Google, you can also just retract the /sitemap.xml you submitted and add /sitemap/sitemap.xml which should be allowed by the /sitemap/ entry in the sample robots.txt, but that doesn’t work for robots that expect a default /sitemap.xml location.)
Thanks,
To view this discussion on the web visit https://groups.google.com/d/msgid/dataverse-community/a6efd1b6-afb6-478f-8619-5ae64caeab89n%40googlegroups.com.