Hi all,
I hope this is not off-topic, but since I first read
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p1360r0.pdf
it is not clear to me what this SG tries to tackle:
1. Language constructs needed for ML
2. Data structures for ML algorithms (tensors, graphs, etc.)
3. A library of common building blocks for ML (optimizers, linear algebra)
4. A full-fledged ML and data science ecosystem (i.e. CERN's Root on steroids)
Also, which community are we targeting? C++ developers
building/deploying production systems or scientists designing the
algorithms and pipelines for data analytics and prediction? These are
completely different communities and the latter does not use C++ to my
knowledge (because they are not taught C++ and are afraid of it, see
SG20).
I would very much like being able to implement data science and ML
pipelines in C++ instead of doing it in Python (because of the type
system, the performance, etc.), but the lack of a
scikit-learn+pandas+dask C++ equivalent makes it nearly impossible.
I may be approaching things in the wrong way in my work, but nowadays,
colleagues using Python are much more productive, not because of the
language, but because there are /de facto/ standards
(numpy+pandas+matplotlib) which have allowed building an ecosystem for
data science and machine learning where all higher level libs (SciPy,
scikit-learn, scikit-image) are easy to compose together.
If we want to do this with C++, we spend lots of time and energy choosing
the appropriate lib (I like the Random Forest from Shark, but prefer the
NN from Dlib or the logistic regression from mlpack) and then my code
has to convert data from shark vector to eigen arrays and then to
armadillo ...
In C++ we don't even have the equivalent of GNU GSL but there are 3 or 4
implementations of tensor libs with expression templates for deep
learning.
There are efforts, like the Apache Arrow project
(
https://arrow.apache.org/) which will make easier to share in memory
data between languages and maybe the need to do ML in C++ will not exist
anymore.
I find the existence of this SG really thrilling, but I am afraid that,
from the outside, the goals and perimeter are not clear. Maybe this is
absolutely OK at this point in time (I have never taken part in such a
group), so don't take my thoughts too seriously!
Thanks for this amazing initiative.
Jordi
On Tue 15-Jan-2019 at 17:56:53 +01, "Michael Wong (via Google Docs)" <
fragga...@gmail.com> wrote:
> Michael Wong has invited you to edit the following document:
> *
> PxxxxR0: SG19 Machine Learning Layered list
> Open in Docs
>
>
> Google Docs: Create and edit documents online. Logo for Google Docs
> Google LLC, 1600 Amphitheatre Parkway, Mountain View, CA 94043, USA
> You have received this email because someone shared a document with you from Google Docs.
> Logo for Google Docs