Today I’m breaking down the monolithic transcribe server so it can use new dynamic registry. My first step is building a configuration API endpoint, allowing the frontend to pass tenant-specific API keys and model choices directly to the backend.
Next, I’ll extract the hardcoded Whisper and PyTorch logic into an isolated plugin, which will massively speed up our server boot time. Once that foundation is set, I'll refactor the audio worker to act purely as a data router, and finally, chain our speech-to-text output directly into translation for real-time interpretation.
3. What is blocking me from making progress?
No strict technical blockers on my end right now. But,
I am currently awaiting maintainer review and merge on the core architecture PRs (#61, #64, #66). Having these merged soon will ensure a clean base branch for the upcoming Flask endpoint PRs, though I can continue building the endpoint locally in the meantime.