Hi all, and thanks for providing the REST service for EuropePMC!
I have some questions around the cursorMark value in GET /search results.
I've done some tests with different query parameters. It seems that even if the data changes (e.g., with rising hit counts), the nextCursorMark values stay stable for the same query run at different times when sorting by, e.g., "sort=P_DATE_D asc" (see below).
I'm looking to run search queries for my research, and would like to gauge their reproducibility, ideally for a longer period of time. E.g., the same query using a nextCursorMark with sorting by publication date (for past years) produces the same results next year.
I realize that queries with publication dates in the past and current year or the future may be "unstable", as publications are being added to the database.
- Q1: Is this true also for publication dates further in the past than, say, last year? I.e., can results for, e.g., 2014 change?
Given that cursor marks replaced "
offset" (CIT-2221), I'm assuming that they are stable.
- Q2: Can you confirm this?
- Q3: Additionally, what is the likelihood/the parameters under which specific nextCursorMarks will change? E.g., with the next major release of epmc-rest? Changes in the data? Restarts of the service?
Many thanks and kind regards
Stephan
---
Overview of naive testing results
Testing for query="cancer", resultType="idlist", pageSize="1", sort="
P_DATE_D asc" at different times today (3 iterations with different hitCounts), always gives me these results for the first ten pages:
- p. 1: cursorMark in query: "*" -> "id": "29139677", "nextCursorMark": "AoIH///6lIpLjAAoMzc0NTM0OTc="
- p. 2: cursorMark in query: "AoIH///6lIpLjAAoMzc0NTM0OTc=" -> "id": "29139674", "nextCursorMark": "AoIH///6lIpLjAAoMzc0NTM0OTQ="
- p. 3: cursorMark in query: "AoIH///6lIpLjAAoMzc0NTM0OTQ=" -> "id": "PMC5534590", "nextCursorMark": "AoIH///6lIpLjAAoMzc0NTA2MjQ="
- p. 4: cursorMark in query: "AoIH///6lIpLjAAoMzc0NTA2MjQ=" -> "id": "PMC5534606", "nextCursorMark": "AoIH///6lSnwsAAoMzc0NTA2Mzk="
- p. 5: cursorMark in query: "AoIH///6lSnwsAAoMzc0NTA2Mzk=" -> "id": "PMC5545433", "nextCursorMark": "AoIH///6mhKAYAAoMzc0NTA5NTQ="
- p. 6: cursorMark in query: "AoIH///6mhKAYAAoMzc0NTA5NTQ=" -> "id": "PMC5550175", "nextCursorMark": "AoIH///6mrIlhAAoMzc0NTEyMTU="
- p. 7: cursorMark in query: "AoIH///6mrIlhAAoMzc0NTEyMTU=" -> "id": "PMC5550171", "nextCursorMark": "AoIH///6mrIlhAAoMzc0NTEyMTE="
- p. 8: cursorMark in query: "AoIH///6mrIlhAAoMzc0NTEyMTE=" -> "id": "PMC5550125", "nextCursorMark": "AoIH///6mrIlhAAoMzc0NTExNjU="
- p. 9: cursorMark in query: "AoIH///6mrIlhAAoMzc0NTExNjU=" -> "id": "PMC5550113", "nextCursorMark": "AoIH///6mrIlhAAoMzc0NTExNTM="
- p. 10: cursorMark in query: "AoIH///6mrIlhAAoMzc0NTExNTM=" -> "id": "PMC5545520", "nextCursorMark": "AoIH///6ognWsAAoMzc0NTEwMzk="