Hi John,
It's Carlos, I'm developing Trending Papers. Let me try to answer some of your questions with what I've learned by interacting with the API over the last 6 months:
- I'm always using the identifiers in the metadata. In those, I can assure you, it never gets a v2 or v3, or whatever. It's always YYMM.######.
- An article posted in Arxiv years ago (e.g. 2015) might be updated or retracted. If it's updated, then it appears in the new metadata file for today (e.g. last 28th). (Detecting retractions is a bit harder but totally possible).
- Every day that Arxiv publishes a new metadata file, I see approx. ~450 new papers (in Computer Science), and ~150 updates. So, handling updates properly is important.
Hope it helps!
Cheers,
Carlos