Combining local and hosted llm to minimize token cost

21 views
Skip to first unread message

Bruce Mcpherson

unread,
Jun 5, 2026, 8:14:38 AM (8 days ago) Jun 5
to Google Apps Script Community

Minimize AI token cost by using a hosted LLm (eg Gemini) as a strategic planner and a local model (eg Gemma) as the executor. 


As an open source developer, my work is voluntary and unpaid, and therefore have to balance the potential token cost at my own personal expense, versus the value of any time saving I might make.


This article is about combining the planning capability of Antigravity, with the a free local model (in this example, Gemma running under oMLX on a Mac) doing the grunt work. Like this my Gemini costs are minimal, and the local heavy work is free.


https://ramblings.mcpher.com/combining-local-and-hosted-llm-to-minimize-token-cost/

Hybrid_LLM_Architecture_Overview.png

Reply all
Reply to author
Forward
0 new messages