Exports and downloads

36 views
Skip to first unread message

Arun Prasad

unread,
Mar 31, 2026, 1:36:29 AMMar 31
to ambuda-discuss
Work in progress:

https://ambuda.org/texts/downloads/ contains two downloads: metadata.json (list of all texts with source info, URLs for download, authors, etc.) and tei-headers.xml (list of all <teiHeader> elems from all documents on the site).

The goal is programmatic access to the full library for easier ingestion by other projects.

Still missing:

- bulk downloads of texts themselves. Certainly XML, likely TXT, not sure on other formats (eg probably not PDFs in each choice of script)
- `updated_at` to see if a text changed
- license information (CC0 1.0 Universal vs. various combinations of BY/NC/SA)

Arun

Arun Prasad

unread,
Mar 31, 2026, 2:25:25 PM (14 days ago) Mar 31
to ambuda-discuss
Added bulk downloads and `updated_at`.

Still TODO:
- bulk download is missing some texts, need to debug why.
- license information

Arun Prasad

unread,
Apr 2, 2026, 4:39:27 PM (12 days ago) Apr 2
to ambuda-discuss
- All texts available through bulk export.
- Most texts have license information.
- Plain text downloads now have better structure.

Still much more to do here, but this is enough for people to download the library for themselves.
Reply all
Reply to author
Forward
0 new messages