If you want to pass data through an external C++ program, there's already an operation on Spark's distributed datasets (RDDs) called pipe(). You give it a command, and it will launch that process, pass each element of the dataset to its stdin, and return an RDD of strings representing the lines printed to the process's stdout.
I guess the other thing you can do, of course, is to write a native library and access it through JNI. You can do that the same way you'd call it through Java.
Matei
You might also be able to write a version of pipe() / PipedRDD that takes byte arrays instead of strings. Just write them directly to the subprocess's InputStream. If you'd like to write that and send a pull request on GitHub, I can incorporate it into the code.
Finally, you could also call your C++ code through JNI, but beware that there's some stuff you need to learn if you haven't used JNI before (http://java.sun.com/docs/books/jni/html/jniTOC.html).
Matei
If you just wanted to go with base64 encoding, you can probably do it on top of the existing API without making a new RDD subclass. Just call a map() to turn your byte arrays into base64-encoded strings, then call pipe(), then map() the results of that.
Matei