Combining local and hosted llm to minimize token cost

46 views

Skip to first unread message

Bruce Mcpherson

unread,

Jun 5, 2026, 8:14:38 AMJun 5

to Google Apps Script Community

Minimize AI token cost by using a hosted LLm (eg Gemini) as a strategic planner and a local model (eg Gemma) as the executor.

As an open source developer, my work is voluntary and unpaid, and therefore have to balance the potential token cost at my own personal expense, versus the value of any time saving I might make.

This article is about combining the planning capability of Antigravity, with the a free local model (in this example, Gemma running under oMLX on a Mac) doing the grunt work. Like this my Gemini costs are minimal, and the local heavy work is free.

https://ramblings.mcpher.com/combining-local-and-hosted-llm-to-minimize-token-cost/

Reply all

Reply to author

Forward

0 new messages