mediapipe hand tracking on Android, C++ ndk side

Andrea Fiorito

unread,

Oct 11, 2024, 9:50:53 PM10/11/24

to MediaPipe

Hello,
my goal is to write a prototype on Android using GameActivity that allows 3d hand landmark tracking. GameActivity lives on the C++ NDK side, and it allows to use mainly C/C++ as coding language. The HandLandmark Solution, despite showing the functionality i'm interested in, is instead written in Kotlin and so it doesn't apply to my use case.
I use a custom module (made by myself) based on camera2ndk API to get a live stream from camera, that calls on a separate thread a callback with signature:
void processFrame(int w, int h, uint8_t *buffer, int64_t timestampNs, ImagePipe &pipe)
the image in buffer is yuv12 format but easily convertible to rgb(a) if needed. ImagePipe contains a triplebuffer for output purposes (pipe.imageTripleBuf->GetWriteBuffer()).
For the prototype purposes the hand landmarks can be just logged, so I don't need the visualisation part.

What I would need to succeed are:
1- one or more .so libraries to link to on my CMakeLists.txt file
2- a set of headers (relative to mediapipe public API) to include in my cpp files
3- any input/configuration files needed
4- the code to do the actual processing.

for point 4, it looks like the code should be something along the line of:

void init() {
// Build and initialize the MediaPipe graph
mediapipe::Status status = graph.Initialize(
// Path to the hand tracking graph file (make sure the path is correct)
"mediapipe/graphs/hand_tracking/hand_tracking_desktop_live.pbtxt"
);
if (!status.ok()) {
LOGI("Failed to initialize graph: %s", status.ToString().c_str());
return status;
}

// Start the graph
status = graph.StartRun({});
if (!status.ok()) {
LOGI("Failed to start graph: %s", status.ToString().c_str());
return status;
}
}

void processFrame(int w, int h, uint8_t *buffer, int64_t timestampNs, ImagePipe &pipe) {
// this may not need to be necessary if GRAY8 or YCBCR420P or YCBCR420P10
auto image_data = convertToRGBA(w,h,buffer);
// Create a MediaPipe ImageFrame with the RGBA data
mediapipe::ImageFrame image_frame(
mediapipe::ImageFormat::SRGBA, width, height,
mediapipe::ImageFrame::kGlDefaultAlignmentBoundary,
reinterpret_cast<uint8_t*>(imageData));
// Prepare the input packet from the ImageFrame (RGBA image)
auto input_packet = mediapipe::Adopt(&image_frame).At(mediapipe::Timestamp(TimestampNs));

// Add the image packet to the graph
status = graph.AddPacketToInputStream("input_video", input_packet);
if (!status.ok()) {
LOGI("Failed to add packet to input stream: %s", status.ToString().c_str());
return status;
}

// Wait for the result (hand landmarks) from the output stream
mediapipe::Packet landmark_packet;
status = graph.WaitForOutputPacket("hand_landmarks", &landmark_packet);
if (!status.ok()) {
LOGI("Failed to get output packet: %s", status.ToString().c_str());
return status;
}

// Get the hand landmarks from the packet
const auto& hand_landmarks = landmark_packet.Get<mediapipe::NormalizedLandmarkList>();

// Log the hand landmarks
for (int i = 0; i < hand_landmarks.landmark_size(); ++i) {
const mediapipe::NormalizedLandmark& landmark = hand_landmarks.landmark(i);
LOGI("Landmark [%d] - x: %f, y: %f, z: %f", i, landmark.x(), landmark.y(), landmark.z());
}
}

void cleanup(){
// Close the graph and cleanup
status = graph.CloseInputStream("input_video");
if (!status.ok()) {
LOGI("Failed to close input stream: %s", status.ToString().c_str());
return status;
}

status = graph.WaitUntilDone();
if (!status.ok()) {
LOGI("Failed to close graph: %s", status.ToString().c_str());
return status;
}

return mediapipe::OkStatus();
}

This comes from ChatGpt, and it's along the lines of the files:
mediapipe/examples/desktop/demo_run_graph_main.cc

mediapipe/examples/desktop/demo_run_graph_main_gpu.cc

mediapipe/examples/desktop/simple_run_graph_main.cc

I can also see that in mediapipe/examples/desktop/hello_world/hello_world.cc the graph is passed as string constant and I can bypass the access to file system for that.

On mediapipe/graphs/hand_tracking I can see there are different graphs, I'm not sure which one would be best for my case, I'd prefer the fastest and I guess that's the GPU one, hand_tracking_mobile.pbtxt seems a good candidate and It could be possible to modify it removing the image output as optimisation, on the other end the desktop examples seem to use other graphs.

For point 1:
I have no idea how to cross compile the part I need to generate the .so library. the Installation guide looks like it's meant to compile for the host machine, where target and host machine are the same. (https://ai.google.dev/edge/mediapipe/framework/getting_started/install)
The guide for android https://ai.google.dev/edge/mediapipe/framework/getting_started/install seems to be useful to compile the examples for android, but it does not clarify how to specify the openCV version for android or any other libraries required (absl?).
Could I use libmediapipe_tasks_vision_jni.so extracted from tasks-vision-0.10.14.aar and bypass this step all together?

For point 2:
It seems there is no public API encapsulated in a single .h file. Do I have to take the whole h files if I want to share this as an Android Studio project? I can't find any C++ or C API in the documentation in here: https://ai.google.dev/edge/api

For point 3:
I'm not sure what other files are needed: in the Kotlin example (from https://github.com/google-ai-edge/mediapipe-samples), there is in the asset folder a file hand_landmarker.task and I'm not sure what it contains. The graph could be even put in as string constant, but on the other end I'm not sure why here: https://ai.google.dev/edge/mediapipe/framework/getting_started/hello_world_android it says "MediaPipe graphs are .pbtxt files, but to use them in the application, we need to use the mediapipe_binary_graph build rule to generate a .binarypb file".
In the desktop example, it seems the build file has a dependency on files like //mediapipe/modules/hand_landmark:hand_landmark_full.tflite but I'm not sure where to get the right one for android if needed.
Also I can see the solution on the mediapipe-samples sets various parameters in HandLandmarkerHelper.kt, so I guess if I'm able to reproduce exactly the same functionality on the ndk side I would have to set the same parameters. Are those ending as side packets or something else?

Can anyone would be able to shed some light on those points, and help me progress on this?

Sorry if the questions are not confusing or not make full sense, I'm new with this library but I find it very interesting and I would to become proficient in it.
In my opinion, would be cool if Android would be supported on the NDK side with some examples and/or solutions.

Thanks!

Andrea Fiorito

unread,

Oct 31, 2024, 12:24:14 AM10/31/24

to MediaPipe

For anyone who is interested in the same subject,

I was able to achieve my goal cross-compiling libhand_landmarker.so for Android. This is found on mediapipe tasks, C api. I identified a small subset (~10) of header files needed and I based the processing function on mediapipe/tasks/c/vision/hand_landmarker/hand_landmarker_test.cc file.

I'm not sure what exact difference are with the sample in kotlin in media-samples repository. I think one is that the C api does not currently support Gpu processing.

I may share the source code of the prototype if anyone is interested, feel free to contact me in that case.

Federica Paolì

unread,

Sep 5, 2025, 12:56:13 PMSep 5

to MediaPipe

Hi Andrea,

I would also like to be able to cross-compile .so ARM for Android, but for face landmark.
I would be very interested in the source code of the prototype, as I have tried compiling it, but I can only compile it on x86_64.

Thanks in advance!

Andrea Fiorito

unread,

Sep 22, 2025, 9:28:08 AMSep 22

to MediaPipe

Hi, do you still need this?

Reply all

Reply to author

Forward