Minimize AI token cost by using a hosted LLm (eg Gemini) as a strategic planner and a local model (eg Gemma) as the executor.
As an open source developer, my work is voluntary and unpaid, and therefore have to balance the potential token cost at my own personal expense, versus the value of any time saving I might make.
This article is about combining the planning capability of Antigravity, with the a free local model (in this example, Gemma running under oMLX on a Mac) doing the grunt work. Like this my Gemini costs are minimal, and the local heavy work is free.
https://ramblings.mcpher.com/combining-local-and-hosted-llm-to-minimize-token-cost/
