Sindice doesn't reindex certain URLs

40 views
Skip to first unread message

Ruben

unread,
Aug 14, 2013, 12:17:33 PM8/14/13
to sindi...@googlegroups.com

Dear all,

In an attempt to remove some old pages that contained outdated metadata,
I've tried to make them return a 404 or 410 and then pinging Sindice with them.
Since that didn't work, I decided to fill them with some completely unrelated RDF,
hoping that this would be picked up instead of the old contents.

However, reindexing doesn't work for the following pages;
they maintain their old contents from 2011-05-16.
Maybe their unusual URLs have something to do with that?
(They seem to be an artefact from faulty redirections.)

Note in particular how the Sindice search results create extra "amp;"s in the URL.
I think that the ampersand is HTML-escaped instead of URL-escaped, giving rise to the inception-esque URLs.

It would be great if you could help me reindex those URLs,
or even better complete remove them :-)

Best,

Ruben

Giovanni Tummarello

unread,
Aug 14, 2013, 1:20:33 PM8/14/13
to sindice-dev
Hi Ruben,
the 404+ping should really work, we'll look into this as soon as the
person that knows how to follow the flow is back from the summer
leave.
sorry for the inconvenience

Gio
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Sindice Developers" group.
> To post to this group, send email to sindi...@googlegroups.com
> To unsubscribe from this group, send email to
> sindice-dev...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/sindice-dev?hl=en
>
> http://sindice.com http://sig.ma http://www.deri.ie
>
> ---
> You received this message because you are subscribed to the Google Groups
> "Sindice Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sindice-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Ruben

unread,
Aug 14, 2013, 1:27:52 PM8/14/13
to sindi...@googlegroups.com, giovanni....@deri.org

Dear Gio,

Thanks. I’ve turned them into 404s now.

Basically, everything from domain ruben.3click.be will now return 404,
so anything that remains in the index indicates that something went wrong.

Best,

Ruben

Giovanni Tummarello

unread,
Aug 14, 2013, 2:42:27 PM8/14/13
to sindice-dev
+ ping :)
gotta ping them
Gio

Ruben

unread,
Aug 14, 2013, 2:44:58 PM8/14/13
to sindi...@googlegroups.com, giovanni....@deri.org

I do… I’ve pinged them almost daily the past couple of months ;-)
That’s exactly the problem: pinging doesn’t help (whether I return 200/301/404/410).

Ruben

Ruben

unread,
Sep 14, 2013, 6:57:22 AM9/14/13
to sindi...@googlegroups.com, giovanni....@deri.org
Hi Gio,


the 404+ping should really work, we'll look into this as soon as the
person that knows how to follow the flow is back from the summer leave.

Any update on this?

Thanks,

Ruben 
Reply all
Reply to author
Forward
0 new messages