fasttext production architecture

1,025 views
Skip to first unread message

Matan Safriel

unread,
Jun 18, 2017, 8:22:47 AM6/18/17
to fastText library
Hi,

After generating a model that performs sufficiently well for our use case, we wish to plan for putting it into use in production. Now as a starting point, all I know is that the library comes with an executable, and of course I can spin it up from whatever programming language, push inputs on stdin (a new line signaling the end of an input) and receiving the classification on the executable's stdout. Hopefully no memory leaks and this has been well tested already.

That however wouldn't extend to a very standard model of concurrency for high-volume environments (one prediction at a time per spawned executable, v.s. being able to spawn multiple threads). Of course, we can have a small module managing a small fleet of "worker" executables, pushing incoming requests to the available worker. 

Few words about the application use case: our scenario classifies short sentences flowing in typically one by one ― and must provide real-time responsiveness and be highly scalable. I wonder whether there are any other environments where the model can be used for real-time prediction/inference as is, before developing our own wrappers.

Also in case there's an entry point for using the code as a library rather than an executable ― we'd like to know about that.

Can you share some advice over this?

Thanks,
Matan






Matan Safriel

unread,
Jun 18, 2017, 8:29:37 AM6/18/17
to fastText library
Forgot to say it explicitly ― I am referring to using the classification feature in production, not to generating word embeddings in production.

joneide...@gmail.com

unread,
Jul 3, 2017, 7:34:18 AM7/3/17
to fastText library
I'm in the exact same situation as Matan. Has anyone found a viable solution to this yet?

Matan Safriel

unread,
Jul 3, 2017, 4:51:43 PM7/3/17
to fastText library, joneide...@gmail.com
So, there are several hypothetical paths that come to mind, considering prediction (not training) capability:


  1. Someone implementing the prediction algorithm outside the fasttext codebase, e.g. as part of an existing machine learning library. I'm not aware this has been done yet anywhere. 


  2. In case the fasttext maintainers were to separate fasttext into a library and an executable, then there would be a C++ library that anyone can use to predict from already-trained models. Then you'd use that library from your (C++, or say, python) code, if your code base is C++ or something that can reasonably link with C++ like python.

    In this hypothetical scenario, if your codebase isn't one of the lucky languages sporting good C++ integration, you might suffice with an http server that uses the fasttext C++ library, and call that server from your other-language code.

    As said, this requires that someone teases apart fasttext into a library and an executable, v.s. the executable-only code base that it seems to be now (and that this change is accepted into the trunk).

    This implies making sure that some or most of fasttext's C++ functions are reentrant so that the library lets you call the prediction function from multiple C++ threads, in order to make it a better fit for high-throughput scenarios. Some refactoring may be needed as part of this.


  3. Otherwise, anyone can spin up the fasttext executable as an OS shell process, invoked with the `-` option that makes it read from stdin and output its prediction per line, on stdout. This is easy to do in python, node.js, even (slightly less elegantly in) Java.

    In exploratory testing, it may seem that on every new line encountered on stdin, fasttext will emit its prediction on stdout, therefore a fit for "online" usage.

    The minor trouble with this approach is that it won't scale very well in terms of memory requirements. Trained models can be quite large, and for sub-second latency at scale, you'd want to spin up several of them processes. The problem there is that this will consume quite a bit of memory (e.g. in my case, the model size is 0.5 GB and each fasttext process takes up around 0.75 GB of RAM). That said, for a single line of text, my specific model emits back its prediction on average in ~1.5 msecs, so one process could still do a lot on its own, even without siblings that make for a really concurrent ensemble. So the memory consumption problem is therefore somewhat attenuated in nature.

    One way to solve the memory scalability issue might be to add, in the fasttext codebase or elsewhere, code that reads a model from disk and places it into OS shared memory. With a (relatively) small change to the main fasttext executable, and necessary changes to the docs and help message, fasttext could be added an option to seek the model in shared memory rather than load it from disk; I think this would solve the memory scalability challenge, at the cost of some extra-complexity (shared memory API may slightly vary from OS to OS). This is if I assume that most of my 0.75 GB of RAM per fasttext process are simply the 0.5 GB trained model.


Personally, I think option (2) would be ultimate to consume. That will likely also enable not just prediction through an API, but also training a new model or continued training of a model via API. A python wrapper can then easily be devised on top of the API. Some new tests will have to be added.


But meanwhile, as for option 3, I started a toy implementation that wraps around the executable and serves predictions back over http. Call it a very simple fasttext REST server; Starts up with a given model, and serves classification requests over that model. Hopefully no memory leaks when the fasttext executable is used like this over time.

If you'd like to reuse it, you can drop me a line if you have special wishes around it, and I'll see what I can do.

Let me know what I got wrong above.

Cheers,
Matan

--
You received this message because you are subscribed to a topic in the Google Groups "fastText library" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/fasttext-library/MX_7i5DqBS8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to fasttext-library+unsubscribe@googlegroups.com.
To post to this group, send email to fasttext-library@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fasttext-library/125f0e21-b00a-4225-a03c-b19476432eea%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Edouard G.

unread,
Jul 4, 2017, 1:21:48 PM7/4/17
to fastText library
Hi,

Thank you for your comments.

It should be fairly easy to do option (2) with the current codebase. If you look at the fasttext.h file, the FastText class should already have all the required methods: 
 - you can load an existing model with loadModel
 - you can then call the predict method (the one which takes a std::vector<std::pair<real, std::string>>& as third argument, to store the predictions of the model). Please pull the latest version of fastText as we just fixed a small bug in that function.

Would this work for you?

Best,
Edouard.
To unsubscribe from this group and all its topics, send an email to fasttext-libra...@googlegroups.com.
To post to this group, send email to fasttext...@googlegroups.com.

Matan Safriel

unread,
Jul 4, 2017, 1:34:37 PM7/4/17
to Edouard G., fastText library
Thanks Edouard, I didn't realize that.

So, a question about this, whereas I already got a clojure prototype working, where a web server responds with predictions over an stdin/stdout managed fasttext process.

Do you find that a proper addition, submitted as a pull request with all implied tests and documentation, would make it into the trunk? just curious, as my option 3 solution has taken all the time I have for this now, but maybe someone fast on C++ would pick up the item sooner or later...

Matan

To unsubscribe from this group and all its topics, send an email to fasttext-library+unsubscribe@googlegroups.com.
To post to this group, send email to fasttext-library@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/fasttext-library/e0b1360a-e751-479b-8d72-d8b6d572686c%40googlegroups.com.

Alex Ott

unread,
Nov 30, 2017, 2:39:08 PM11/30/17
to fastText library
For option 2, I would recommend to look to grpc.io - could be more performant, and easier to write.
To unsubscribe from this group and all its topics, send an email to fasttext-libra...@googlegroups.com.
To post to this group, send email to fasttext...@googlegroups.com.

sandy64...@gmail.com

unread,
Jul 17, 2018, 3:29:09 PM7/17/18
to fastText library
by the way where to find the executable file for Windows OS
Reply all
Reply to author
Forward
0 new messages