Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

(Respectively) crawling through gopherspace?

50 views
Skip to first unread message

Donnie Corbitt

unread,
Dec 16, 2020, 9:26:34 AM12/16/20
to
Hi everyone! I'm wanting to crawl through all of gopherspace and archive it every so often while also respecting robots.txt and users which don't want to be apart of the archive. What are some good resources on crawling through the internet other than the web? After searching a bit every crawl tutorial I see is about crawling through the web, not other protocols.

Any info is appreciated, thank you! ^^

m...@ph.or

unread,
Dec 19, 2020, 2:44:19 PM12/19/20
to
You might want to search the archives and/or re-post on the gopher-project list:

https://lists.debian.org/gopher-project/

I believe there are already a few subscribers that are doing periodic
Gopherspace crawls, mostly for keeping search engine DBs updated.

-M4

Mateusz Viste

unread,
Dec 21, 2020, 2:33:47 AM12/21/20
to
Instead of inventing you're own wheel, you could perhaps extend an
existing project.

OGUP is a simple engine that crawls the gophernet, only to keep a list
of active servers (and possibly discover new servers). In the process
it collects the content of directories, but throw it away. It would be
relatively easy to make it write the content of directories into some
database (relational or filesystem-based) for archiving purposes. Then,
extend the archiving activities to text files. I'd gladly provide you
with pointers if you'd like to work on that.

OGUP is written in C89.


Mateusz

Mateusz Viste

unread,
Dec 21, 2020, 2:34:14 AM12/21/20
to
gopher://gopher.viste.fr/1/ogup/

Mateusz

rtyler

unread,
Dec 21, 2020, 2:29:07 PM12/21/20
to
Oh no! I'm not part of the observable universe!

Perhaps some day gopher.brokenco.de will be seen ;)



--
--
GitHub: https://github.com/rtyler

GPG Key ID: 0F2298A980EE31ACCA0A7825E5C92681BEF6CEA2

Dennis Boone

unread,
Dec 21, 2020, 8:29:16 PM12/21/20
to

Mateusz Viste

unread,
Dec 22, 2020, 2:19:51 AM12/22/20
to
2020-12-21 at 19:29 -0000, rtyler wrote:
> On 2020-12-21, Mateusz Viste <mat...@xyz.invalid> wrote:
> > 2020-12-21 at 08:33 +0100, Mateusz Viste wrote:
> >> 2020-12-16 at 06:26 -0800, Donnie Corbitt wrote:
> >>
> >> OGUP is written in C89.
> >
> > gopher://gopher.viste.fr/1/ogup/
>
>
> Oh no! I'm not part of the observable universe!
> Perhaps some day gopher.brokenco.de will be seen ;)

The crawler is temporarily offline due to... me not having much time to
take care of it (and I need to find a server to run it on since the
previous server died). I will add brokenco.de as soon as the crawler
starts running again.

Would be nice to have a "suggest a new server" feature as well some
day. And robots.txt support. So many things to do, so little time...

Mateusz

Daniel

unread,
Jan 18, 2021, 5:50:52 AM1/18/21
to
There's also a gopher project irc channel on freenode. Here are the
channels I join regularly:

#gopher
#gopherproject
#gophernicus

--
Daniel
Visit me at: gopher://gcpp.world
0 new messages