View and Download statistics in DSpace CRIS

983 views
Skip to first unread message

Oliver Goldschmidt

unread,
Aug 11, 2017, 7:25:56 AM8/11/17
to DSpace Technical Support
Hi all,

I'm struggling with the view and download counts in DSpace CRIS. I do not understand how they are generated, updated and where they are stored.

To illustrate my problem I'm pointing to Eurocris: On the right side of the page http://dspacecris.eurocris.org/handle/11366/194 are two boxes containing the number of views and the number of downloads. Clicking on the number directs to a page containing the statistics put into a nice graph (http://dspacecris.eurocris.org/cris/stats/item.html?handle=11366/194). Here are my questions:
* In the box of my example I see "checked on Aug 10". How can I update the values? Do I need to run a script or is it happening automatically?
* I saw a table cris_metrics in the database. On my test system there are some values, which look reasonable to me. But they are not used on my page. Where is the data coming from? Or is some cache fooling me here?
* Is there any chance to import usage data from a logfile to generate these values retrospectively?

I know the page https://wiki.duraspace.org/display/DSPACECRIS/Metrics and studied it carefully, but unfortunately I'm still not able to understand, what is missing. Does anybody have an idea or can give me a hint, how to continue in order to make these metrics work on my test system?

Thank you, best regards
Oliver

Bollini Andrea

unread,
Aug 14, 2017, 3:20:44 AM8/14/17
to Oliver Goldschmidt, DSpace Technical Support

Hi Oliver,

* In the box of my example I see "checked on Aug 10". How can I update the values? Do I need to run a script or is it happening automatically?
Yes, there are a few batch scripts involved on that. Unfortunately before we only have them listed here, now I have added a link from the Metrics page:

https://wiki.duraspace.org/display/DSPACECRIS/jobs+in+CRONTAB

Usage statistics: ./view-and-download-retrieve
Weekly Jobs
./period-weekly-retrieve (compute the weekly variation of all the previous metrics)
Monthly Jobs
./period-monthly-retrieve (compute the monthly variation of all the previous metrics)

* I saw a table cris_metrics in the database. On my test system there are some values, which look reasonable to me. But they are not used on my page. Where is the data coming from? Or is some cache fooling me here?

Yes, the cris_metrics is the table where the data are permanently stored but they are cached in memory and exposed to the application only via SOLR. Data are read when a new SOLR Searcher is opened or when you force solr to explicitly flush the metrics cache (more info in the documentation, check the first info box)

* Is there any chance to import usage data from a logfile to generate these values retrospectively?
metrics are generated from the data available in the SOLR statistics core, so you can use any of the existent strategy to back load your data here to get them used for metrics. See https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics+Maintenance#SOLRStatisticsMaintenance-DSpaceLogConverter

Hope this help,
Andrea
--
You received this message because you are subscribed to the Google Groups "DSpace Technical Support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dspace-tech...@googlegroups.com.
To post to this group, send email to dspac...@googlegroups.com.
Visit this group at https://groups.google.com/group/dspace-tech.
For more options, visit https://groups.google.com/d/optout.


--
Questo messaggio e' stato analizzato con Libra ESVA ed e' risultato non infetto.
Clicca qui per segnalarlo come spam.


-- 
Andrea Bollini
Chief Technology and Innovation Officer

4Science,  www.4science.it
office: Via Edoardo D'Onofrio 304, 00155 Roma, Italy
mobile: +39 333 934 1808
skype: a.bollini
linkedin: andreabollini
orcid: 0000-0002-9029-1854

an Itway Group Company
Italy, France, Spain, Portugal, Greece, Turkey, Lebanon, Qatar, U.A.Emirates

-- 
This message has been checked by Libra ESVA and is believed to be clean.

Oliver Goldschmidt

unread,
Aug 14, 2017, 4:09:27 AM8/14/17
to DSpace Technical Support, o.gold...@tu-harburg.de, Andrea....@4science.it
Hi Andrea,

thank you for the information. I have run the scripts and they did their job without any error messages. I also ran a clearcache query on the search index and I rebuilt and optimized it a couple of times (already last week). But unfortunately all of that did not help: my metrics boxes on the item pages are still showing "checked on September 16, 2016" and are showing old data. I'm not sure what is still missing.
Which Solr cores are involved in these metrics tasks? I only investigated the search and statistics cores. But there is also an (empty) solr index in temp/solr-index. Is this one important? Should I delete these cores and start all over with the metrics?

Best
Oliver

Oliver Goldschmidt

unread,
Aug 14, 2017, 5:02:08 AM8/14/17
to DSpace Technical Support, o.gold...@tu-harburg.de, Andrea....@4science.it
I have just tried to start over and deleted the whole stistics and search core from Solr. After that, I ran the scripts  ./view-and-download-retrieve, ./period-weekly-retrieve and ./period-monthly-retrieve and rebuilt the search core (./dspace index-discovery). After all that was finished, I still see the old data in the metrics boxes, although the cris_metrics table is as I expected it. Obviously the old data is somewhere deep inside the system, but I can't get it updated with the new data...

Any suggestions?

Thanks
Oliver

Bollini Andrea

unread,
Aug 14, 2017, 5:10:28 AM8/14/17
to Oliver Goldschmidt, DSpace Technical Support

Hi Oliver,

the statistics core is used to calculate the metrics value and the search core is used to "access" the calculated value.

When I say that the search core is used to access the value I mean that there are some dynamic fields in the search core schema that are backend by a custom SOLR field that at the end read from the database (caching the values).

We should check if there is an issue in accessing the data or in producing them. Which data do you have in the database? is a new row created after that you execute the scripts?

assuming you are checking the metrics of the item with ID 1 you should run the following query before and after to execution of the script

dspace/bin/view-and-download-retrieve

select * from cris_metrics where resource_type_id = 2 and resource_id =1;

and you should see two new rows created.

If you don't see the two new table rows, it could be useful to check what go in your dspace and solr logs during the script execution.

Andrea

Bollini Andrea

unread,
Aug 14, 2017, 5:18:48 AM8/14/17
to Oliver Goldschmidt, DSpace Technical Support

I see your message just after to have sent a reply to the previous email.

You say

although the cris_metrics table is as I expected it
so this mean that you have recent rows, flagged as "last = true", with the new fresh calculated values?

if you have optimized the solr data the cache should (must) be gone... the only more strong way to assure that is to restart the solr tomcat. If you can give it a try.

Anyway, if you can exclude a cache problem and you still see the old data it is likely that your SOLR instance is connected to a different database. Check the

/dspace-install/solr/search/conf/database.properties  (it is filled with the db connection values from your dspace.cfg at installation/update time)

Andrea

Reply all
Reply to author
Forward
0 new messages