Feedback on Linear Algebra

Matthieu Brucher

unread,

Jan 5, 2019, 3:50:01 AM1/5/19

to sg...@isocpp.org

Hi all,

I will finish adding some comments on the SG14 linear algebra preliminary document, but I would also like some more general feedback on a succession paper/presentation that I think we will have to do.

Currently, the linear algebra effort is very much focused on basic linear algebra, when my view is that the requirement is more for a proper multi-rank container with basic operations on top of which we can build linear algebra, graphics, ML... Basically, the multi rank Fortran array with modern numpy usage (views, broadcasts) and more important all the versatility C++ provides (static and dynamic sizes, speed...).

I haven't heard much on this topic, let's try to work something out.

Cheers,

Matthieu

--

Quantitative analyst, Ph.D.
Blog: http://blog.audio-tk.com/
LinkedIn: http://www.linkedin.com/in/matthieubrucher

Jordi Inglada

unread,

Jan 5, 2019, 10:49:32 AM1/5/19

to Matthieu Brucher, sg...@isocpp.org

Hi,

Have you heard about the work the folks at QuantStack are doing about this:

https://github.com/QuantStack/xtensor

As far as I understand, their goal is to promote C++ as the language for the implementation of common data science algorithms upon which bindings for the popular data science languages (Python, R and Julia) can be automatically generated. The current status in DS and ML is that many things have to be implemented 3 times (for each of these languages) to reach a wide audience.

xtensor has an API which is very similar to Numpy's. QuantStack is also working on a data frame library (xframe, similar to R's built-in data frame or Python Pandas) and they have also developed a C++ Jupyter kernel (xeus-cling) for interactive development and data analysis.

To my knowledge, this is the only team working to provide a consistent C++ set of libraries for DS, although I am not aware of them working on ML.

Cheers,

Jordi

Matthieu Brucher

unread,

Jan 5, 2019, 11:41:31 AM1/5/19

to Jordi Inglada, sg...@isocpp.org

Hi,

Thanks a lot for the pointer, no, I didn't know about this one.

It looks indeed like something that would be a good prior art for what I think we should aim for. I'll try to have a look at the way strides shapes are handled (we may be able to get more performance by using arrays there when the rank is static?).

Have you some experience with the library?

Cheers,

Matthieu

Jordi Inglada

unread,

Jan 5, 2019, 1:40:22 PM1/5/19

to Matthieu Brucher, sg...@isocpp.org

On Sat 05-Jan-2019 at 17:41:19 +01, Matthieu Brucher <matthieu...@gmail.com> wrote:
> Hi,
>
> Thanks a lot for the pointer, no, I didn't know about this one.
> It looks indeed like something that would be a good prior art for what I think we should aim for. I'll try to have a look at the way strides shapes are handled (we may be able to get more performance by using arrays there when the rank is static?).

I think you may be interested in having a look at this post https://medium.com/@QuantStack/xtensor-c-and-the-julia-challenge-643d4b119761

> Have you some experience with the library?
>

Well, just a bit. To learn xtensor, I ported Joel Grus' Neural Net toy library

https://github.com/joelgrus/joelnet

from Python to C++ in a few hours

https://github.com/inglada/joelnet/tree/master/include

which was easy because of the API similarity with Numpy.

I haven't had the chance to use xtensor for my work yet though.

Cheers,

Jordi

Matthieu Brucher

unread,

Jan 6, 2019, 10:20:57 AM1/6/19

to Jordi Inglada, sg...@isocpp.org

That's great, thanks a lot. Guy (Davidson) will add xtensor to the prior art.

The best part is that it can be done ;) Then we can propose better APIs or keywords for C++ as well, if it can simplify notation (I'm mainly thinking about matrix multiplication operator and slices). If you have an idea there, I'll integrate it.

Cheers,

Matthieu

johan....@gmail.com

unread,

Jan 7, 2019, 9:46:34 AM1/7/19

to SG19 - Machine Learning, jordi....@cesbio.eu

Hi all,

I'm a core developper of xtensor. We (I and the QuantStack team behind xtensor) would be thrilled to collaborate with you on this topic. To give you a bit of context, xtensor is a library that provides:

- an extensible expression engine that can operate on any object implementing a given interface (so not only in-memory container, but also objects backed by a database or a representation on the file system).

- features inspired by Numpy, such as broadcasting, views, operations. You can find a cheat sheet that enumerates the available features and their Numpy equivalent.

- an API following the idioms of C++; each expression, including the containers, provides inner typedefs as standard containers do, and iterator pairs so standard algorithms can be used with xtensor expressions; the method names tend to be consistent with the ones of the standard containers, and so on.

Regarding the in-memory containers, xtensor provides three kind of containers:

- xarray, a N-dimensional array with a dynamic shape and a dynamic number of dimensions

- xtensor, a N-dimensional array with a dynamic shape but a fixed number of dimensions

- xtensor_fixed, a N-dimensional array with full static shape

We continuously benchmark xtensor and have pretty good performances (closed to eigen for 1D and 2D operations, and closed to the implementation in C with loops for higher dimensions), although there is still possible improvement for some specific operations (mainly reductions).

Jordi Inglada gave a good summary of our vision, you can find more detail in this blog post and this one.

We would be happy to answer any question you may have.

Cheers,

Johan

Matthieu Brucher

unread,

Jan 8, 2019, 6:05:14 PM1/8/19

to johan....@gmail.com, SG19 - Machine Learning, Jordi Inglada

Hi Jordan,

Great, this sounds indeed interesting.

There is some pushback because of the component wise only operations (I think this is central to usability of the container).

Is there a technical reason why xtensor has 3 containers instead of just one?

Cheers,

Matthieu

--
You received this message because you are subscribed to the Google Groups "SG19 - Machine Learning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sg19+uns...@isocpp.org.
To post to this group, send email to sg...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/sg19/.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/sg19/77c407bf-c6d4-4fbc-a7f7-8b30a1c357d6%40isocpp.org.

johan mabille

unread,

Jan 10, 2019, 5:41:30 AM1/10/19

to Matthieu Brucher, SG19 - Machine Learning, Jordi Inglada

Hi Matthieu,

We also provide dot operations in xtensor-blas; and we plan to add tensor dot in xtensor at some point. The Numpy experience has proven that it is more convenient to use operator* for element wise multiplication rather than for dot operations.

Regarding the containers, they are really different and provide different constructors / initialization mechanisms. All the common API has been abstracted in the xstrided_container and xcontainer base classes. I guess we could go further and provide a single container class, but I'm afraid it would contain a lot of SFINAE magics to enable or disable methods depending on the shape type, so I'm not sure that would help with readability.

Cheers,

Johan.

Frank Seide

unread,

Jan 10, 2019, 12:45:31 PM1/10/19

to johan mabille, Matthieu Brucher, SG19 - Machine Learning, Jordi Inglada

I agree that a good objective is to stay as close as meaningful to Numpy, which has already thought through and solved lots of problems.

The use of * for elementwise was unusual for me at first as well, but one gets used to it. I would consider elementwise products and matmul are equally legit. The only alternative would be to introduce new operators, such as Matlab’s “.*”

I like that xtensor does not come with its own BLAS implementation, but provides an adapter. A C++ standard implementation should probably come with some implementation, but it might be good to write the standard in a way that users can purchase a faster implementation and plug it in.

The lazy evaluation (xexpression) might be a basis for auto-diff, as well as for trace-based exporting of ML models as static graphs (e.g. as ONNX).

One question is how xtensor integrates with GPUs, which is the single most critical feature for machine learning. E.g. this may severely limit the use of lambdas.

What are possibilities for extending the language syntax? For example

· a multi-dimensional [] operator (currently one would probably emulate that with operator())

· syntactic sugar for Python-like index slices, e.g. [2:5] (currently one can use a special data structure such as [Slice(2,5)])

· dot operator (Python 3.5 introduced @)

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/sg19/CAB2pg2QR3DYp60eN%3Ddk05XUutrG6LpAcykD_dbN5EJmJmVxbew%40mail.gmail.com.

Matthieu Brucher

unread,

Jan 10, 2019, 5:09:57 PM1/10/19

to Frank Seide, johan mabille, SG19 - Machine Learning, Jordi Inglada

I think this is also very close to my position. I don't mind operator(), it's also what is used by the other matrix libraries.

Johan> how is the return type inferred if you have 3 different types? I would assume that you would return the most static possible one to get good performance? Also are 1D and 2D arrays optimized to remove stride computation? Without them, a fixed-size 1D or 2D would be as baritone as a static int[x][y], which is what the geometry guys will want to have.

I put together some slides, currently reading them again, and I will send them to the list later tonight.

Cheers,

Matthieu

johan mabille

unread,

Jan 10, 2019, 7:23:03 PM1/10/19

to Matthieu Brucher, Frank Seide, SG19 - Machine Learning, Jordi Inglada

> One question is how xtensor integrates with GPUs, which is the single most critical feature for machine learning. E.g. this may severely limit the use of lambdas.

xtensor does not have GPU support yet, we will add it this year.

> how is the return type inferred if you have 3 different types? I would assume that you would return the most static possible one to get good performance?

Indeed, when different types are involved in an expression, the return type is the most static possible.

> Also are 1D and 2D arrays optimized to remove stride computation?

No, but benchmarks haven't showed significant differences with traditional implementations of 1D and 2D arrays (like Eigen for instance), so it seems that compilers are smart enough to optimize away the stride computation.

Cheers,

Johan

Michael Wong

unread,

Jan 10, 2019, 9:33:51 PM1/10/19

to SG19 - Machine Learning, matthieu...@gmail.com, fse...@microsoft.com, jordi....@cesbio.eu

Thanks all for the xtensor description. I would like to add that to the agenda as well if someone is prepared to present this.

johan mabille

unread,

Jan 11, 2019, 3:19:16 AM1/11/19

to Michael Wong, SG19 - Machine Learning, Matthieu Brucher, Frank Seide, Jordi Inglada

I would be happy to present it. How long should this presentation be? Is there some format to respect?

Cheers,

Johan

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/sg19/a0cc252f-9cb3-41c9-b530-5e459d5135a6%40isocpp.org.

Matthieu Brucher

unread,

Jan 11, 2019, 4:59:49 AM1/11/19

to SG19 - Machine Learning, matthieu...@gmail.com, fse...@microsoft.com, jordi....@cesbio.eu

Le vendredi 11 janvier 2019 00:23:03 UTC, johan mabille a écrit :

> One question is how xtensor integrates with GPUs, which is the single most critical feature for machine learning. E.g. this may severely limit the use of lambdas.
xtensor does not have GPU support yet, we will add it this year.

> how is the return type inferred if you have 3 different types? I would assume that you would return the most static possible one to get good performance?
Indeed, when different types are involved in an expression, the return type is the most static possible.

> Also are 1D and 2D arrays optimized to remove stride computation?
No, but benchmarks haven't showed significant differences with traditional implementations of 1D and 2D arrays (like Eigen for instance), so it seems that compilers are smart enough to optimize away the stride computation.

Indeed, that would be what I expect, the issue I'm concerned with is memory usage. AFAIK, geometry.game/other domains would want to expect 9/16 doubles memory usage for a (3,3) or (4,4) matrix, and nothign more.

Of course, that's not your job to tackle this ;)

Cheers,

Matthieu

Cheers,

Johan

To post to this group, send email to s...@isocpp.org.

Visit this group at https://groups.google.com/a/isocpp.org/group/sg19/.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/sg19/77c407bf-c6d4-4fbc-a7f7-8b30a1c357d6%40isocpp.org.

--

Quantitative analyst, Ph.D.
Blog: http://blog.audio-tk.com/
LinkedIn: http://www.linkedin.com/in/matthieubrucher

--
You received this message because you are subscribed to the Google Groups "SG19 - Machine Learning" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sg19+uns...@isocpp.org.
To post to this group, send email to sg...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/sg19/.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/sg19/CAB2pg2QR3DYp60eN%3Ddk05XUutrG6LpAcykD_dbN5EJmJmVxbew%40mail.gmail.com.

Reply all

Reply to author

Forward