RAG pipelines or LLM agents

Andreas Nef

unread,

Jun 23, 2026, 6:16:25 AM (8 days ago) Jun 23

to AtoM Users

Wondering if there are any examples of building an AI chat agent on top of AtoM? Either by using the data through a RAG pipline, or implemeting an agent that would use the different API endpoints?

Best, Andreas

pieters...@gmail.com

unread,

Jun 23, 2026, 6:33:12 AM (8 days ago) Jun 23

to ica-ato...@googlegroups.com

Hi Andreas

I did this-see orange circle bottom righthand

https://psis.theahg.co.za/heritage

Yes, we've built exactly this on top of AtoM, and it works well. A few notes from our experience that might help:

RAG over the catalogue. We index archival descriptions (title, scope & content, dates, named entities) into a vector store and run a hybrid retriever: semantic/vector search combined with keyword (BM25 via the existing OpenSearch/Elasticsearch index), plus entity- and hierarchy-aware strategies. Hybrid matters a lot for archives, pure vector search is poor at identifiers, reference codes, and exact name lookups, which users rely on. Answers are grounded in the retrieved descriptions and linked back to the actual records, so it's citable rather than hallucinated.

Chat agent. On top of that we run a conversational assistant over the collection (with conversation history), and the same retrieval layer feeds NER, summarisation, and translation pipelines.

Agent-over-API. Both approaches you mention are viable. AtoM's data is very accessible, the REST API, plus a GraphQL layer we added, make good "tools" for an agent. We route all model calls through a single gateway (keyed/metered, with self-hostable/offline models), which we'd strongly recommend: archives often can't send descriptions to a third-party cloud, and a gateway lets you swap models and keep an audit trail.

A few hard-won gotchas if you build the RAG path:

- Keep the vector index in sync with deletions. Deleted records that linger in the vector store surface as high-ranked phantom hits.

- Make sure the search index your retriever queries are sourced from the same corpus/DB you hydrate results from. An index-name mismatch gives you the classic "N results found, but nothing displays" symptom.

- Apply your access/embargo/publication filtering to the retrieval layer, not just the UI, otherwise the model can surface restricted material in its answers.

Happy to go into more detail off list if useful.

Groete / Regards

Johan Pieterse (PhD)

The Archive and Heritage Group (Pty) Ltd

https://heratio.theahg.co.za/

jo...@theahg.co.za

+27 082 337-1406

--
You received this message because you are subscribed to the Google Groups "AtoM Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ica-atom-user...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ica-atom-users/da6789b1-01fb-4dc3-8c5d-55efb5875233n%40googlegroups.com.

Johan Pieterse

unread,

Jun 23, 2026, 6:43:12 AM (8 days ago) Jun 23

to AtoM Users

We also added a Symantic search option.
Wrt to the orange button. It is voice activated. But you can right click and ask questions as well

Reply all

Reply to author

Forward