doc.sagemath.org gone from google

210 views
Skip to first unread message

William Stein

unread,
Aug 7, 2016, 11:25:56 AM8/7/16
to sage-devel, Harald Schilly

Harald Schilly

unread,
Aug 7, 2016, 11:53:17 AM8/7/16
to William Stein, sage-devel
Does anyone know who is administrating the combinat pages? Hiding those pages could help. Besides that, I'm working with Paul Masson for months now to fix these indices. It's a stubborn problem, though...

-- h
--
Harald Schilly -- SageMath, Inc.
https://cloud.sagemath.com

Harald Schilly

unread,
Aug 7, 2016, 12:02:44 PM8/7/16
to William Stein, sage-devel

Volker Braun

unread,
Aug 7, 2016, 12:14:39 PM8/7/16
to sage-devel, h...@sagemath.com
Presumably Google decided that combinat is the canonical url, right? Hiding combinat is only kicking the problem down the road, there are presumably many copies of the Sage docs hosted somewhere. The correct solution would be to include a <link rel="canonical" href="http://doc.sagemath.org/path/to/help.html"/> in our docs to disambiguate. For example like in https://github.com/Pylons/pylons_sphinx_theme/pull/8/files

Harald Schilly

unread,
Aug 7, 2016, 12:17:57 PM8/7/16
to Volker Braun, sage-devel
Yes, you are right, this sounds like a really good idea!

-- h

Paul Masson

unread,
Aug 7, 2016, 7:04:27 PM8/7/16
to sage-devel, vbrau...@gmail.com
As I understand them canonical URLs distinguish between the same content accessed through different URL forms. Currently Google thinks the documentation is mostly located at www.sagemath.org/doc/ while we want it to appear at doc.sagemath.org/html/en, so the canonical links should help with that distinction. It's unclear to me that it will have any effect on combinat.sagemath.org/doc/, since that is much older content and Google should be able to tell the difference.

The biggest problem right now, however, is that Google is taking a very long time to index doc.sagemath.org at all. At the beginning of June I created sitemaps listing every HTML and PDF document in the Sage documentation and Harald submitted them to Google promptly. After about a month most of the PDF links were indexed and those documents appear at the new location. At about the same time less than a third of the HTML links were listed as indexed by Google. That figure has been steady for the last month, and is currently just a bit over one third.

In order to prompt Google to index the sitemap, over the last month I have manually removed links to documents at www.sagemath.org/doc/ in Google's webmaster tools and added them to the index at doc.sagemath.org/html/en. This has had little effect: even links I've added manually several times still will not appear in a search including site:doc.sagemath.org. Google is just taking it's time to index the new location, even with an explicit sitemap of all HTML files.

If someone knows a better way to prompt Google to index a sitemap I'm all ears, but I don't think it's possible. If we want the documentation to appear at doc.sagemath.org we simply need to wait until Google indexes things. Adding canonical links isn't likely to have an effect if the documents aren't getting indexed.

Regarding combinat.sagemath.org/doc/ I don't think removing it is a displacement of the problem. This is the only other copy of the documentation in the sagemath.org domain. Presumably Google thinks it's relevant even though old because it is located in the same /doc/ directory where Google thinks the newer documentation is located. Removing it from the Internet and pointing to the newer version at doc.sagemath.org would be in the best interest of the entire Sage community, if anyone knows who has access to it.

TL;DR waiting for Google to index...

leif

unread,
Aug 10, 2016, 8:53:45 AM8/10/16
to sage-...@googlegroups.com
Shouldn't it be *docs*.sagemath.org anyway?

Like docs.python.org, docs.cython.org, etc. (The former has an alias
though, redirecting to docs.)


-leif


Paul Masson

unread,
Aug 17, 2016, 9:13:02 PM8/17/16
to sage-devel, h...@sagemath.com
Harald got access to the server hosting combinat.sagemath.org and added a server-side redirect to the corresponding documents on doc.sagemath.org. Hopefully that will speed up the indexing by Google of the new documentation location.

Paul Masson

unread,
Oct 19, 2016, 6:24:17 PM10/19/16
to sage-devel, h...@sagemath.com
In case no one else has noticed, doc.sagemath.org is now showing up in Google searches, including the link that initiated this thread.

Google taketh away, and Google giveth back.


On Sunday, August 7, 2016 at 8:25:56 AM UTC-7, William wrote:

Samuel Lelievre

unread,
Oct 20, 2016, 4:13:06 AM10/20/16
to sage-devel, h...@sagemath.com
2016-10-20 00:24:17 UTC+2, Paul Masson:

> doc.sagemath.org is now showing up in Google searches,
> including the link that initiated this thread.

Hoorray! This is great news for all SageMath users.

Thanks Paul and Harald for your efforts to make it happen.
And thanks to anyone else involved (eg in the combinat team).

Johan S. H. Rosenkilde

unread,
Oct 20, 2016, 4:34:47 AM10/20/16
to sage-...@googlegroups.com
> In case no one else has noticed, doc.sagemath.org is now showing up in
> Google searches, including the link that initiated this thread.

Indeed I have noticed! It's really good news for us and - especially -
for new Sage users :-)

Best,
Johan
--

kcrisman

unread,
Oct 20, 2016, 9:36:07 PM10/20/16
to sage-devel


On Wednesday, October 19, 2016 at 6:24:17 PM UTC-4, Paul Masson wrote:
In case no one else has noticed, doc.sagemath.org is now showing up in Google searches, including the link that initiated this thread.

Google taketh away, and Google giveth back.


Boy, I sure hope you are right - even yesterday I tried to get a canonical link for something in a thematic tutorial and ended up getting sent to all kinds of old versions living in random places on the web.  Good!

Paul Masson

unread,
Oct 20, 2016, 10:15:26 PM10/20/16
to sage-devel
Searching for Sage on Google was hit-and-miss for me yesterday but much more consistent today. It apparently takes time for all Google servers to sync their search results and may take a couple more days to appear consistently.
Reply all
Reply to author
Forward
0 new messages