Hi,
After generating a model that performs sufficiently well for our use case, we wish to plan for putting it into use in production. Now as a starting point, all I know is that the library comes with an executable, and of course I can spin it up from whatever programming language, push inputs on stdin (a new line signaling the end of an input) and receiving the classification on the executable's stdout. Hopefully no memory leaks and this has been well tested already.
That however wouldn't extend to a very standard model of concurrency for high-volume environments (one prediction at a time per spawned executable, v.s. being able to spawn multiple threads). Of course, we can have a small module managing a small fleet of "worker" executables, pushing incoming requests to the available worker.
Few words about the application use case: our scenario classifies short sentences flowing in typically one by one ― and must provide real-time responsiveness and be highly scalable. I wonder whether there are any other environments where the model can be used for real-time prediction/inference as is, before developing our own wrappers.
Also in case there's an entry point for using the code as a library rather than an executable ― we'd like to know about that.
Can you share some advice over this?
Thanks,
Matan