Refactor SUSI Translator from a hardcoded system to a plugin-based, multi-tenant architecture that supports isolated AI models, real-time streaming, and dynamic session configuration.
Reserached on TTS options for the MVP.
Edge TTS (typically accessed via the popular edge-tts Python library), is highly popular for MVPs because it acts as a free proxy to Microsoft's premium neural cloud voices without requiring an API key, Azure subscription, or hosting costs. This is good for MVP stage, because Microsoft’s official stance is that using their proprietary Edge Read-Aloud services for commercial applications without an Azure Cognitive Services subscription violates their terms of service. Explored some light open source models that can be considered later on like:
1. Kokoro-82MLicense: Apache 2.0 (free for commercial use).
Size: 82 million parameters (~300MB in memory).
Ecosystem: Packages like Kokoro-FastAPI allow you to run it inside a Docker container, creating an instant, drop-in API endpoint.
- Languages: Supports 9+ languages (English, Japanese, Korean, and major European languages) with curated style presets.
2. MeloTTSLicense: MIT License (free for commercial use)
Languages: Supports English (with regional American, British, Indian, and Australian accents), Spanish, French, Japanese, Korean, and Chinese.
- Pros: It is uniquely excellent at handling mixed-language strings (e.g., a sentence containing both Chinese and English words).
3. Piper TTS
License: MIT License.
Languages: Supports over 100 voices across 40+ languages.
Pros: Ultra-low resource footprint, zero dependencies on Python at runtime.
I'm helping polish the MVP to ensure it's ready for feedback by Monday. Today I'll focus on authentication and auto deployment-related work, and review the Eventyay videos to understand the plugin integration requirements.