Statistics ip lists obsolete and update script not working

58 views
Skip to first unread message

Jürgen, Claudia

unread,
Jan 17, 2023, 9:19:39 AM1/17/23
to DSpace Technical Support
Hi all,

I noted two things about the iplists used for stats-util.

The lists are configured in:

https://github.com/DSpace/DSpace/blob/main/dspace/config/modules/solr-statistics.cfg

solr-statistics.spiderips.urls = http://iplists.com/google.txt, \
http://iplists.com/inktomi.txt, \
http://iplists.com/lycos.txt, \
http://iplists.com/infoseek.txt, \
http://iplists.com/altavista.txt, \
http://iplists.com/excite.txt, \
http://iplists.com/misc.txt


a) the lists are most likely obsolete and thus the statistics very
imprecise with regards to bot traffic
https://iplists.com/
The last revised dates on the site are from 2008 and 2014
Maybe we need another source for iplists and a "cleanup".

b) stats-util -u (in order to get theoretically updated files) does not
work and throws an NPE
Getting: http://iplists.com/google.txt
To: /opt/dspace/dspace63tu/config/spiders/iplists.com-google.txt
- Error: null
java.lang.NullPointerException
at org.apache.tools.ant.taskdefs.Get.doGet(Get.java:221)
at org.apache.tools.ant.taskdefs.Get.execute(Get.java:134)
at
org.dspace.statistics.util.StatisticsClient.updateSpiderFiles(StatisticsClient.java:152)
at
org.dspace.statistics.util.StatisticsClient.main(StatisticsClient.java:80)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.dspace.app.launcher.ScriptLauncher.runOneCommand(ScriptLauncher.java:229)
at
org.dspace.app.launcher.ScriptLauncher.main(ScriptLauncher.java:81)

Sunny Greetings

Claudia

--
Claudia Juergen

Technische Universität Dortmund
Universitätsbibliothek
Bibliotheks-IT
Vogelpothsweg 76
44227 Dortmund

Tel.: +49 231-755 40 43
Fax: +49 231-755 40 32
claudia...@tu-dortmund.de
www.ub.tu-dortmund.de


Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.

Karol

unread,
Jan 21, 2023, 11:28:39 AM1/21/23
to DSpace Technical Support
Hi Claudia,

i have exactly the same problem.UP.

Best,

Karol

Mark H. Wood

unread,
Jan 23, 2023, 9:53:01 AM1/23/23
to dspac...@googlegroups.com
This seems to have been fixed in https://github.com/DSpace/DSpace/issues/8528

The code relies on Ant's 'get' task to do the downloading. It appears
that we have been playing fast and loose with Ant's infrastructure,
and a shortcut that used to work now fails.

This single use drags all of Ant into the DSpace runtime. Maybe we
should be using Commons HttpComponents, which is found in a number of
places in DSpace, instead?

--
Mark H. Wood
Lead Technology Analyst

University Library
Indiana University - Purdue University Indianapolis
755 W. Michigan Street
Indianapolis, IN 46202
317-274-0749
www.ulib.iupui.edu
signature.asc

Claudia Jürgen

unread,
Jan 23, 2023, 10:17:11 AM1/23/23
to dspac...@googlegroups.com
Hi Karol and all,

I did not have time to look into it. Most of the ip list are not free
anymore, so I wonder how we can clean up the statistics, like replacing
them with a new source of lists and then flag the bots and remove them.

Sunny greetings

Claudia




Am 21.01.2023 um 17:28 schrieb Karol:
> Hi Claudia,
>
> i have exactly the same problem.UP.
>
> Best,
>
> Karol
>
> wtorek, 17 stycznia 2023 o 15:19:39 UTC+1 Claudia Jürgen napisał(a):
>
> Hi all,
>
> I noted two things about the iplists used for stats-util.
>
> The lists are configured in:
>
> https://github.com/DSpace/DSpace/blob/main/dspace/config/modules/solr-statistics.cfg <https://github.com/DSpace/DSpace/blob/main/dspace/config/modules/solr-statistics.cfg>
>
> solr-statistics.spiderips.urls = http://iplists.com/google.txt
> <http://iplists.com/google.txt>, \
> http://iplists.com/inktomi.txt <http://iplists.com/inktomi.txt>, \
> http://iplists.com/lycos.txt <http://iplists.com/lycos.txt>, \
> http://iplists.com/infoseek.txt <http://iplists.com/infoseek.txt>, \
> http://iplists.com/altavista.txt <http://iplists.com/altavista.txt>, \
> http://iplists.com/excite.txt <http://iplists.com/excite.txt>, \
> http://iplists.com/misc.txt <http://iplists.com/misc.txt>
>
>
> a) the lists are most likely obsolete and thus the statistics very
> imprecise with regards to bot traffic
> https://iplists.com/ <https://iplists.com/>
> The last revised dates on the site are from 2008 and 2014
> Maybe we need another source for iplists and a "cleanup".
>
> b) stats-util -u (in order to get theoretically updated files) does not
> work and throws an NPE
> Getting: http://iplists.com/google.txt <http://iplists.com/google.txt>
> Tel.: +49 231-755 40 43 <tel:+49%20231%207554043>
> Fax: +49 231-755 40 32 <tel:+49%20231%207554032>
> claudia...@tu-dortmund.de
> www.ub.tu-dortmund.de <http://www.ub.tu-dortmund.de>
>
>
> Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich.
> Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie
> nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie
> bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
> Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen
> ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher
> Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung
> eines solchen Schriftstücks per Telefax erfolgen.
>
> Important note: The information included in this e-mail is
> confidential. It is solely intended for the recipient. If you are
> not the intended recipient of this e-mail please contact the sender
> and delete this message. Thank you. Without prejudice of e-mail
> correspondence, our statements are only legally binding when they
> are made in the conventional written form (with personal signature)
> or when such documents are sent by fax.
>
> --
> All messages to this mailing list should adhere to the Code of Conduct:
> https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx
> <https://www.lyrasis.org/about/Pages/Code-of-Conduct.aspx>
> ---
> You received this message because you are subscribed to the Google
> Groups "DSpace Technical Support" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to dspace-tech...@googlegroups.com
> <mailto:dspace-tech...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/dspace-tech/7492efa0-a1ab-49ad-a821-1cf5bd652846n%40googlegroups.com <https://groups.google.com/d/msgid/dspace-tech/7492efa0-a1ab-49ad-a821-1cf5bd652846n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Mark H. Wood

unread,
Jan 23, 2023, 10:53:34 AM1/23/23
to dspac...@googlegroups.com
On Mon, Jan 23, 2023 at 04:17:07PM +0100, Claudia Jürgen wrote:
> I did not have time to look into it. Most of the ip list are not free
> anymore, so I wonder how we can clean up the statistics, like replacing
> them with a new source of lists and then flag the bots and remove them.

There is a PR for a new source: https://github.com/DSpace/DSpace/pull/2892
signature.asc
Reply all
Reply to author
Forward
0 new messages