Coincidentally, I've been working on the
HuggingFace Transformer Import the last couple of weeks because I needed to run the
KaLM-Gemma3 12B model importing.
The code to read from HuggingFace is entirely within the
go-huggingface project. I created the
KaLM-Gemma3 example here. It uses the
`models/transformer` library in go-huggingface to load the configurations and weights, then calls the actual implementation in GoMLX's `
pkg/ml/model/transformer`.
Fundamentally, it lays a framework for implementation. Currently, it implements only one of the transformer "architectures", and supports the set of configurations I found when implementing my model. It now needs support for more models—both to increase support and to check whether current abstractions suffice or require refactoring (likely not, as this was a first attempt).
There is a first attempt at it in GoMLX
`pkg/ml/decode`, but it has a couple of serious issues (and maybe others):
- Bucketing: GoMLX still only supports fixed shapes and requires "JIT-recompilation" for different shapes, making bucketing essential for performance. I attempted to address this in go-huggingface's `
tokenizers/bucket`.
- KVCache: It is currently stored as variables in the context. But that doesn't work if more than one inference runs simultaneously, so it's the wrong API. I've been postponing this because I also want to implement a paged KVCache approach, that make more efficient use of memory, but that is more complicated.
Dynamic Shapes Support
The upcoming priority (for this quarter) for GoMLX is to add support for dynamic shapes. That includes: (1) add support for dynamic shapes in the Go backend (`simplego`); (2) Refactor GoMLX /backends package into its own repo (so another project can use it independent of the rest of GoMLX); (3) Add ONNXRuntime as a backend (currently we are able to convert ONNX to GoMLX), now we want to be able to execute a GoMLX computation using ONNX. And maybe also add llama.cpp as a backend as well. Just a heads-up this is coming.
---
Sorry, this was a long email. I appreciate the enthusiasm, working with ML/AI is super fun, full of surprises and one learns a lot. My recommendation is to have a model or project that uses GoMLX and that you want to see through. First, just work on GoMLX to fill missing functionality you need. In other words, start with small changes and get used to developing with or in GoMLX before attempting anything large.
Find me on Slack if you have something in mind and want to chat at some point.
cheers!