Technical issues

100 views
Skip to first unread message

Arun Prasad

unread,
Mar 26, 2026, 2:06:19 AMMar 26
to ambuda-discuss
Setting aside all feature work, there are three big technical issues remaining:

1. Managing the current Flask + Celery setup, which consumes too much memory until the site comes to a crawl and restarts. Options are

(a) figure out why it's consuming so much memory (1.5G at peak) and tame the code,
(b) pay for a nicer server and avoid this problem entirely,
(c) re-architect the site to avoid these problems (more static assets, switch to go/rust, etc)

2. Support blue/green deployments

Right now the site is on bare metal, and when I deploy, the site goes down. This is actually worse than it sounds because if I'm (e.g.) updating a template, the templates are updated before the code is. This isn't an atomic deploy, and it should be. Various PaaS solutions are available if there's no easy way to do this on commodity servers.

3. Figure out a scaling strategy

We use a single DigitalOcean droplet, which is fine for our current scale of roughly 1 request per second, but I have no clue how the site will do at 100x that. This might involve anything up to and including moving off of DO entirely.

Arun

Bakul Shah

unread,
Mar 31, 2026, 1:15:11 AMMar 31
to Arun Prasad, ambuda-discuss
On Mar 25, 2026, at 11:06 PM, Arun Prasad <aru...@gmail.com> wrote:

Setting aside all feature work, there are three big technical issues remaining:

1. Managing the current Flask + Celery setup, which consumes too much memory until the site comes to a crawl and restarts. Options are

(a) figure out why it's consuming so much memory (1.5G at peak) and tame the code,
(b) pay for a nicer server and avoid this problem entirely,
(c) re-architect the site to avoid these problems (more static assets, switch to go/rust, etc)

Have you looked at Hugo? https://gohugo.io/documentation/
There are lots of themes for it + it is very fast if you have mostly static assets. Lots of organizations + people are using it.

2. Support blue/green deployments

Right now the site is on bare metal, and when I deploy, the site goes down. This is actually worse than it sounds because if I'm (e.g.) updating a template, the templates are updated before the code is. This isn't an atomic deploy, and it should be. Various PaaS solutions are available if there's no easy way to do this on commodity servers.

With hugo this should be less of an issue even if you have thousands of pages.


3. Figure out a scaling strategy

We use a single DigitalOcean droplet, which is fine for our current scale of roughly 1 request per second, but I have no clue how the site will do at 100x that. This might involve anything up to and including moving off of DO entirely.

Wait, is the site on bare metal (physical h/w colocated somewhere) or a droplet? Anyway, 1 request/sec seems far too low to worry about.


Arun

--
You received this message because you are subscribed to the Google Groups "ambuda-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to ambuda-discus...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/ambuda-discuss/fca497ae-5938-4a36-9501-bba997dc8f15n%40googlegroups.com.

Arun Prasad

unread,
Mar 31, 2026, 2:21:02 PMMar 31
to ambuda-discuss
Thanks for the tip on Hugo. It will help partially but not completely. And my mistake, I meant a droplet, not bare metal.

Some more notes on the site architecture might make the problem clearer.

We currently use a Python backend (Flask) to handle all requests. Texts and dictionaries are mostly static content, but we also have a proofing interface that needs read-write support, and small features in the library need read-write (e.g. user settings and bookmarks). For redundancy, we have two Python web workers served through gunicorn. They share some memory due to copy-on-write but I think are still using redundant memory due to multiple imports of sqlalchemy, etc. in addition to other side data.

Ambuda also handles a variety of async tasks for things like running OCR, splitting PDFs into separate images, and calling other APIs. These touch the database and likewise cause imports of sqlalchemy and other libraries.

In total Ambuda is around 5-6 Python processes, which each pay the 40MB cost of the Python interpreter, additional cost for Python imports and side data, and normal memory consumption for handling requests and async tasks. This is not efficient.

So while texts could be served as static files, this doesn't fix the core problem, which is the resource-heavy proofing setup that needs async workers to process PDFs, run OCR, tokenize, run reports, etc.

Some options I am mulling over:
- spend some time on removing or optimizing heavy imports and data.
- switch to an async Python framework, which means we can reduce the number of Celery processes we create by putting more async tasks in the web backend itself.
- move off of Python to something like Go. I like Rust much more than Go but I value quick iteration and am wary of Rust's compilation times.
- throw money at the problem and get a better droplet.

Arun

Arun Prasad

unread,
May 10, 2026, 1:12:47 AMMay 10
to ambuda-discuss
Previously I would say the chance of a backend rewrite was around 1%. Now I'd say it's around 10%. Writing down some notes for later:

Problems with Python setup:
- complicated: web backend + celery + redis, meaning 3 different services just to run the site. Local dev needs a docker image. If we load our kosha FST (word store) in memory, that's difficult to share across web and celery, which means having a new process to manage that (+ process interop to read from the kosha).
- high memory usage (>1G across all services)
- web process is coupled to env changes on disk, eg if I remove library X used by prod, prod will fail until the deploy completes.
- no support for blue/green deployments unless we buy a beefier server with >2GB memory
- occasionally the site just slows to a crawl unless I restart the prod service via SSH to reduce excess memory usage (maybe due to Python heap fragmentation?)
- likewise, no room for heavier processes like elasticsearch, unless we buy a beefier server.

Buying a bigger server is an easy way out for a few of these problems, but I don't like how brittle and complicated the setup is on bare metal.

Normally I would just buy a better server or pay for a PaaS, but it's easier than ever now to rewrite a project into a different language. Ideally we would have a single binary with <10% of the memory usage that deploys as a single file via rsync. Such a backend would be more performant in every way that matters and also make operations much simpler.

This suggests either Go or Rust as viable targets. Go is the obvious choice and excels at web backends, but I like Rust much more as a language, plus Vidyut is already in Rust which makes integrations very clean.

Basic capabilities we need that come to mind right now:
- sqlite read/write
- sqlite migrations
- user authentication
- form building
- the ability to run async tasks (currently we use celery for this)
- core libraries for parsing XML and splitting PDFs (mainly C++ based, I'm sure there are go/rust bindings for these)

Important behaviors for dev:
- hot reloading, or at least a very tight loop between run and eval
- easy setup for new contributors

Factors I'm still weighing:
- the importance of contributor onboarding. Rust is a far more complicated language, but LLMs make it easier than ever to onboard and make changes.

Arun

Arun Prasad

unread,
May 15, 2026, 1:30:04 AM (13 days ago) May 15
to ambuda-discuss
I've decided to rewrite the site in Rust. Ultimately this came down to a few principles:

- Simplicity: A single binary is extremely appealing vs. our current setup of gunicorn, web workers, Celery workers, and Redis.
- Performance: A single binary in an efficient language like Rust will likely use 10% of our current setup if not less.
- Blue-green deploys: smaller memory pressure means we can more easily launch multiple instances of the site and do a switchover without downtime.

Between Go and Rust, it came down to:

- Vidyut integrations: Vidyut is written in Rust, and we can integrate it into the site more tightly if everything is in one language.
- Correctness: Rust's type system is excellent for modeling specific domain constraints and ensuring we handle every case appropriately. This is useful for a project that cares deeply about making things correct and well-formed.
- Familiarity: I know and like Rust, and I am not personally fond of Go.

Go is easier to learn, compiles quickly, and is practically built for web backends. But LLMs greatly reduce the burden of working with a new language; debug builds in Rust are fast enough, and I'm more comfortable with Rust's data and async model.

The rewrite is around 50% done, and once it's 100% done and feels stable, I'll switch over.

Arun

Ganesan Sriram (GSR)

unread,
May 15, 2026, 1:21:55 PM (12 days ago) May 15
to ambuda-discuss

Predicated on laptops being more powerful with at least 8 GB of RAM, 
Rust does give the option of running a lot in the browser  itself via webassembly.
Perhaps  something that could run off local documents like 
could be considered.
Essentially if current laptop/browser/technology allows work to be offloaded to
the client, then one should take advantage.

-
Sriram



Arun Prasad

unread,
May 19, 2026, 1:52:02 AM (9 days ago) May 19
to ambuda-discuss
The rewrite is around 90% complete. I have a local app that seems to have feature parity on all major flows: library, admin CRUD, OCR, publishing a text, etc.

What remains is careful testing and auditing, then once I have some time to spare, I'll update the server and deployment scripts to use the new code.
Reply all
Reply to author
Forward
0 new messages