genai: Roadmap for LLM inference tasks

35 views

Skip to first unread message

Joshua Puerta

unread,

May 3, 2024, 12:54:38 PMMay 3

to MediaPipe

Hello,

I have been exploring the generative AI inference C++ tasks. I'd like to know more about the roadmap for these features.

As I understand it, the process involves converting SafeTensors or PyTorch binaries into a FlatBuffers binary that includes only weights and metadata, something like a subset of the TFLite format. This binary is then executed by a minimal runtime (XNNPACK for CPU) which only uses TFLite for parsing the FlatBuffers (the runtime includes code to generate a subgraph loading XNNPACK primitives). Please, correct me if there are any inaccuracies.

Given this context, I have a couple of questions:

a) Is there a plan to open-source the converter code for LLM tasks to allow for the integration of custom or modified LLM models? As far as I understand, this is not available yet [https://github.com/google/mediapipe/issues/5355, internals are contained in a .so precompiled in the pip package, but generating that lib seems not possible].

b) Considering that the relationship with TFLite seems limited to the format of the FlatBuffers and its loading into the runtime, is this method intended as a temporary solution until TFLite fully supports LLMs, or is it expected to remain in place?

Thank you for your time and assistance.

Best regards,

Reply all

Reply to author

Forward

0 new messages