Hi,
WARC record IDs are unique to each WARC record. Via "WARC-Concurrent-To"
or "WARC-Refers-To" records can be linked. For example a response record
is linked to a request and a metadata to a response, etc. See the WARC
format specification [1].
If the same URL is fetched twice in one crawl the WARC record will have
different UUIDs.
> add only new URLs in the subsequent indexes and update the existing
> ones
You need to use the URL as document ID or alternatively a digest of the URL.
Best,
Sebastian
[1]
https://iipc.github.io/warc-specifications/specifications/warc-format/warc-1.1/