Hi Mohamed,
Thank you again for your response.
If I understand it correctly, if a solr node is out of sync, there isn't a way of telling that from the response?
I'm assuming the "hitCount" would also be lower in that case (we didn't record it in the past but will in the future).
My main issue is that I don't know when I am missing data (without running the query again sometime in the future).
I think I didn't initially understand your comment regarding the revised preprint correctly. I think I understand it better and it opened up new questions for another day.
Although I think it probably wasn't affecting the issue originally observed. But it did lead to thinking about the updates..
We played a bit with the UPDATE_DATE, and run some queries on 5th October...
We selected a DOI indexed the previous day, which was included as expected in the following query:
"(DOI:10.31234/
osf.io/xdbf9) (FIRST_IDATE:[2022-10-04 TO 2022-10-04]) (SRC:PPR)"
But it wasn't using UPDATE_DATE instead:
"(DOI:10.31234/
osf.io/xdbf9) (UPDATE_DATE:[2022-10-04 TO 2022-10-04]) (SRC:PPR)"
Then again it did include the article using the 5th October (it is no longer):
"(DOI:10.31234/
osf.io/xdbf9) (UPDATE_DATE:[2022-10-05 TO 2022-10-05]) (SRC:PPR)"
That suggests there was some update on the 5th and the filter is only looking at the latest update.
But if we always get data only until yesterday, we may not retrieve the article, if it keeps getting updates.
That is why we are considering of combining it with an OR with FIRST_IDATE:
"(DOI:10.31234/
osf.io/xdbf9) (FIRST_IDATE:[2022-10-04 TO 2022-10-04] OR UPDATE_DATE:[2022-10-04 TO 2022-10-04]) (SRC:PPR)"
Some questions:
- Do you think that would be a good approach?
- What kind of updates would you expect?
Thank you
Daniel