Hi Ron,
> Are there any performance benchmarks that have been
> done with respect to Tensorflow Serving vs Tensorflow-converted
> PMML models?
>
I'm only testing the correctness of converted models:
https://github.com/jpmml/jpmml-tensorflow/tree/master/src/test/java/org/jpmml/tensorflow
The correctness check asserts that TensorFlow predictions and PMML
predictions are equivalent. The goal is to reach absolute equivalence,
which means that TensorFlow and PMML predictions will not differ more
than by 1 ULP.
I'm always prioritizing correctness over (initial-) performance. A
correct thing can be made faster easily. The opposite - making a
fast-performing thing more correct - is much more difficult.
The evaluation of neural network models was heavily refactored in
JPMML-Evaluator version 1.3.7:
https://github.com/jpmml/jpmml-evaluator/commit/7eba189b1e5b8c3b7f3ebd4c9129ea8876af19a5
This refactoring gives JPMML-Evaluator the ability to evaluate models
using different math contexts (see
http://mantis.dmg.org/view.php?id=179). For example, R, Scikit-Learn
and Apache Spark ML export NN models that assume 64-bit FP math
context, whereas TensorFlow exports NN models that assume 32-bit FP
math context.
Just to answer your question, I converted the DNNRegressionAuto model
(
https://github.com/jpmml/jpmml-tensorflow/tree/master/src/test/resources/savedmodel/DNNRegressionAuto)
to PMML data format, and profiled it using the
org.jpmml.evaluator.EvaluationExample command-line example
application:
$ cd jpmml-tensorflow
$ java -jar target/converter-executable-1.0-SNAPSHOT.jar
--tf-savedmodel-input src/test/resources/savedmodel/DNNRegressionAuto/
--pmml-output Auto.pmml
$ java -cp ../jpmml-evaluator/pmml-evaluator-example/target/example-1.3-SNAPSHOT.jar
org.jpmml.evaluator.EvaluationExample --input
src/test/resources/csv/Auto.csv --model Auto.pmml --output /dev/null
--loop 10000
This is the profiling summary on my ancient laptop:
<stdout>
main
count = 10000
mean rate = 343,65 calls/second
1-minute rate = 292,76 calls/second
5-minute rate = 271,78 calls/second
15-minute rate = 267,71 calls/second
min = 2,56 milliseconds
max = 124,31 milliseconds
mean = 2,91 milliseconds
stddev = 1,79 milliseconds
median = 2,63 milliseconds
75% <= 2,71 milliseconds
95% <= 3,72 milliseconds
98% <= 4,88 milliseconds
99% <= 8,26 milliseconds
99.9% <= 21,34 milliseconds
</stdout>
It shows that on average it takes 2.91 milliseconds to score 392 data
records, which translates to ~135'000 data records per second. By
moving over to more modern desktop hardware, and running four threads
instead of one, one would be easily getting 1 million scores per
second.
The selling point of PMML has never been raw performance. The selling
point is having a standalone 1 MB JAR file, which can score pretty
much any predictive model out there.
VR