I've read the 2023 thread about metadata, and it explains most of it well, but I don't fully get the instruction to align the doc_id in the metadata.csv file with the doc_id in the corpus db. Because in order to do that, I'd have to know the corpus ids for each file ahead of time.
I did do some looking into the db using SQLite, and I think, but could use confirmation, that the doc_ids are assigned alphanumerically ascending, first by folder, and then by file name. With the caveat that numbers are sorted numerically rather than lexicographically (so "9" sorts before "10"). Additionally, punctuation sorts after numbers and letters, which is different than the sort order in Google Sheets (which, whatever) or BBEdit, which puts punctuation first.
So two questions, I guess:
1. Am I correct that doc_id is assigned in alphanumeric order?
2. And if so, what is the actual order so I can tell BBEdit how to sort my metadata file?
I have 40,000 files in my corpus so I can't just manually move a few rows around.
Thanks,
Jeff Prucher