Exports and downloads

Arun Prasad

unread,

Mar 31, 2026, 1:36:29 AMMar 31

to ambuda-discuss

Work in progress:

https://ambuda.org/texts/downloads/ contains two downloads: metadata.json (list of all texts with source info, URLs for download, authors, etc.) and tei-headers.xml (list of all <teiHeader> elems from all documents on the site).

The goal is programmatic access to the full library for easier ingestion by other projects.

Still missing:

- bulk downloads of texts themselves. Certainly XML, likely TXT, not sure on other formats (eg probably not PDFs in each choice of script)

- `updated_at` to see if a text changed

- license information (CC0 1.0 Universal vs. various combinations of BY/NC/SA)

Arun

Arun Prasad

unread,

Mar 31, 2026, 2:25:25 PMMar 31

to ambuda-discuss

Added bulk downloads and `updated_at`.

Still TODO:

- bulk download is missing some texts, need to debug why.

- license information

Arun Prasad

unread,

Apr 2, 2026, 4:39:27 PMApr 2

to ambuda-discuss

- All texts available through bulk export.

- Most texts have license information.

- Plain text downloads now have better structure.

Still much more to do here, but this is enough for people to download the library for themselves.

Reply all

Reply to author

Forward