R 3.3.1 depends on a SSL/TLS implementation

286 views
Skip to first unread message

Jean-Pierre Flori

unread,
Oct 27, 2016, 4:03:03 AM10/27/16
to sage-devel
Hi all,

The latest R versions depends on libcurl and actually more than that: on a libcurl with https support.
So we might want to build our own libcurl with https support (see #21767) but we then need an SSL/TLS implementation which Sage curretnly provides only optionally through openSSL because of license issues so we can:
[1] either make R depend on libcurl depend on openssl and they all become optional,
[2] or make R depend on libcurl and make them standard and add an SSL/TLS implementation and its development headers a prereq,
[3] or make libcurl with https support (and development headers) a prereq, which basically means adding an SSL/TLS implementation as a prereq as well,
[4] or make R a prereq,
[5] or drop R support,
[6] or patch R not to use curl,
[7] or patch R to use curl but without https support,
[8] or wait until the end of times,
[9] or a mix of all of this,
[10] or do something else.

What do you think?

Best,
JPF

Jeroen Demeyer

unread,
Oct 27, 2016, 5:26:05 AM10/27/16
to sage-...@googlegroups.com
This is the obvious answer:

> [6] or patch R not to use curl

Why should I need a web download tool to do calculations in statistics?

Francois Bissey

unread,
Oct 27, 2016, 5:29:12 AM10/27/16
to sage-...@googlegroups.com
R package system include downloading facility from repositories.
That’s why you need curl. At least in theory.
I’ll have to check for 3.3.x but there was alternatives in 3.2.x.

François
> --
> You received this message because you are subscribed to the Google Groups "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.

Jeroen Demeyer

unread,
Oct 27, 2016, 5:41:20 AM10/27/16
to sage-...@googlegroups.com
On 2016-10-27 11:29, Francois Bissey wrote:
> R package system include downloading facility from repositories.

That's fine but it should be possible to just run R without it's
packaging system.

Jean-Pierre Flori

unread,
Oct 27, 2016, 5:49:57 AM10/27/16
to sage-devel
Then I suggest a new solution:
[666] :
* patch R if curl (with https support) is not there and make curl an optional package,
* not patch R if curl (with https support) is ther.

Jeroen Demeyer

unread,
Oct 27, 2016, 6:02:42 AM10/27/16
to sage-...@googlegroups.com
Conditionally patching is really bad because it cannot be pushed upstream.

Instead, in proper autoconfig style, we should conditionally build the
parts requiring curl. That is, change the logic in the R build system to
build R without the packaging system if curl is not available.

Emmanuel Charpentier

unread,
Oct 27, 2016, 6:09:20 AM10/27/16
to sage-devel
As of this morning, CRAN offered 9402 packages... These packages offer practical implementations for a tremendous list of applied statistics problems (and more...).

Using R "without its packaging system" is about as useful as using Python without pip or any other Python packaging system...

--
Emmanuel Charpentier

Francois Bissey

unread,
Oct 27, 2016, 6:11:43 AM10/27/16
to sage-...@googlegroups.com
My own examination of R-3.3.1 configure script is that configure will
fail if libcurl 7.28+ is not present or present with ssl disabled
[m4/R.m4 line 4215 thereafter].
Unless the code for `install.package` has significantly changed between
3.2.x and 3.3.x it is an unnecessary failure dictated by an unwillingness
to have to support the alternatives. i.e.. not needing to explain what to
do in when libcurl is not present or without ssl support.
I doubt that
install.package(“$some_package”,method=“$somemethod”)
where $somemethod is CURL (command line) or WGET has disappeared.
The R-3.2.1 package for RH7.1/centos7.1 is not compiled against libcurl
or lack https support for example. An interesting experience I had at work
today with a user.

François

Emmanuel Charpentier

unread,
Oct 27, 2016, 6:13:39 AM10/27/16
to sage-devel


Le jeudi 27 octobre 2016 11:29:12 UTC+2, François a écrit :
R package system include downloading facility from repositories.
That’s why you need curl. At least in theory.

In practice, web access to (a lot of) data repositories, databases, etc... is more and more used : there is no point to force users to download a whole (WHO|CDC|WTO|<you name it) database when you can query it via some public networking interface. Try to link these heterogenous data, and, without network access, you end up with a very fine mess...
 

Jean-Pierre Flori

unread,
Oct 27, 2016, 6:18:33 AM10/27/16
to sage-devel
Upstream R devs might be somewhat uncooperative.
I don't plan on pushing anything upstream.

Emmanuel Charpentier

unread,
Oct 27, 2016, 6:22:37 AM10/27/16
to sage-devel
I'm afraid that we don't have much say in the matter : the R core development team has choosen to rely on curl, and their build system will fail without it. Furthermore, among those 9402 packages, a lot of them may have choosen to follow R "guidance" an use CURL.

If we choose to use another library, we are effectively forking R and a $#!+load of those packages. Somehow, I doubt that we have the workforce to do that credibly...

So, before coming to these extremities, I'd like to explore two avenues :
Question the R Core Team to know how they reconcile their DPL 2-3 license and teir use of OpenSSL, and discuss if we can use the same loophole as they did. In which case, all is fine and dandy...
Excise R proprio dictu and keep the R interface(s) as optional, reworking them to use an externally-installed R. And get rid of the damn beast...

What do you think ?

Francois Bissey

unread,
Oct 27, 2016, 6:24:15 AM10/27/16
to sage-...@googlegroups.com
While not configurable by the user in R-3.2.x it would build if
libcurl wasn’t found or missing https support.
The change to bail out if you don’t fulfil all the condition appear
deliberate to me. And I see the point from a support point of view.

My conclusion is that R developers would fill it as "not a bug".
We really want this.

François

Jean-Pierre Flori

unread,
Oct 27, 2016, 6:28:57 AM10/27/16
to sage-devel


On Thursday, October 27, 2016 at 12:22:37 PM UTC+2, Emmanuel Charpentier wrote:
I'm afraid that we don't have much say in the matter : the R core development team has choosen to rely on curl, and their build system will fail without it. Furthermore, among those 9402 packages, a lot of them may have choosen to follow R "guidance" an use CURL.

If we choose to use another library, we are effectively forking R and a $#!+load of those packages. Somehow, I doubt that we have the workforce to do that credibly...

So, before coming to these extremities, I'd like to explore two avenues :
Question the R Core Team to know how they reconcile their DPL 2-3 license and teir use of OpenSSL, and discuss if we can use the same loophole as they did. In which case, all is fine and dandy...
I don't know how much they thought about it, or rather how many lawyers they did pay, but if they ship binaries including the libcurl and openssl that might be risky.
Some people invoke the special exception clause of the GPL...
See:
* https://people.gnome.org/~markmc/openssl-and-the-gpl.html
* https://groups.google.com/forum/#!searchin/sage-devel/openssl|sort:relevance/sage-devel/mbGbpRz96q0/8nsKRNyLs3oJ (OpenSSL license issue)
* https://groups.google.com/forum/#!searchin/sage-devel/openssl|sort:relevance/sage-devel/Jl11JxIb2E8/-1NrQmjbMKIJ (why GnuTLS which is GPL is not a drop in replacement

If they rely on the user having its own libcurl and openssl then there is no problem.
 

Jeroen Demeyer

unread,
Oct 27, 2016, 6:31:00 AM10/27/16
to sage-...@googlegroups.com
On 2016-10-27 12:24, Francois Bissey wrote:
> While not configurable by the user in R-3.2.x it would build if
> libcurl wasn’t found or missing https support.
> The change to bail out if you don’t fulfil all the condition appear
> deliberate to me. And I see the point from a support point of view.
>
> My conclusion is that R developers would fill it as "not a bug".
> We really want this.

Can we please develop software without the a priori assumption that
upstream won't cooperate?

If we submit a patch and they don't accept it, so be it. But we should
at least try.


Jeroen.

Emmanuel Charpentier

unread,
Oct 27, 2016, 6:33:10 AM10/27/16
to sage-devel


Le jeudi 27 octobre 2016 12:28:57 UTC+2, Jean-Pierre Flori a écrit :


On Thursday, October 27, 2016 at 12:22:37 PM UTC+2, Emmanuel Charpentier wrote:
I'm afraid that we don't have much say in the matter : the R core development team has choosen to rely on curl, and their build system will fail without it. Furthermore, among those 9402 packages, a lot of them may have choosen to follow R "guidance" an use CURL.

If we choose to use another library, we are effectively forking R and a $#!+load of those packages. Somehow, I doubt that we have the workforce to do that credibly...

So, before coming to these extremities, I'd like to explore two avenues :
Question the R Core Team to know how they reconcile their DPL 2-3 license and teir use of OpenSSL, and discuss if we can use the same loophole as they did. In which case, all is fine and dandy...
I don't know how much they thought about it, or rather how many lawyers they did pay, but if they ship binaries including the libcurl and openssl that might be risky.

One more reason to ask them...
 
Some people invoke the special exception clause of the GPL...
See:
* https://people.gnome.org/~markmc/openssl-and-the-gpl.html
* https://groups.google.com/forum/#!searchin/sage-devel/openssl|sort:relevance/sage-devel/mbGbpRz96q0/8nsKRNyLs3oJ (OpenSSL license issue)
* https://groups.google.com/forum/#!searchin/sage-devel/openssl|sort:relevance/sage-devel/Jl11JxIb2E8/-1NrQmjbMKIJ (why GnuTLS which is GPL is not a drop in replacement

If they rely on the user having its own libcurl and openssl then there is no problem.

OK. So why do we not do that also ?
 

Jean-Pierre Flori

unread,
Oct 27, 2016, 6:35:01 AM10/27/16
to sage-devel


On Thursday, October 27, 2016 at 12:31:00 PM UTC+2, Jeroen Demeyer wrote:
On 2016-10-27 12:24, Francois Bissey wrote:
> While not configurable by the user in R-3.2.x it would build if
> libcurl wasn’t found or missing https support.
> The change to bail out if you don’t fulfil all the condition appear
> deliberate to me. And I see the point from a support point of view.
>
> My conclusion is that R developers would fill it as "not a bug".
> We really want this.

Can we please develop software without the a priori assumption that
upstream won't cooperate?

I agree.
Someone (but not me, I'm still angry) should file a bug.

Francois Bissey

unread,
Oct 27, 2016, 7:10:17 AM10/27/16
to sage-...@googlegroups.com
It is a most interesting point because it explain why
the R binary installed from epel for RH7.1 and family
isn’t linked to openssl.
If they don’t have a rock solid argument, and even if they
have, there may not be anymore official R packages from big
league binary distros. Unless they patch R, their legal arm
will forbid it.

So either they will stop distribute R or they will patch
en-masse.

If they don’t patch, you may still be able to find a R
package for your distro but it won’t be approved by the
distro.

François

Emmanuel Charpentier

unread,
Oct 27, 2016, 9:59:26 AM10/27/16
to sage-devel


Le jeudi 27 octobre 2016 13:10:17 UTC+2, François a écrit :
It is a most interesting point because it explain why
the R binary installed from epel for RH7.1 and family
isn’t linked to openssl.

How old is RH7.1 ? The introduction of the libcurl requirement dates from the R 3.3.0 release, (May 3, 2016).

I note that Debian (notoriously twitchy about license issues) didn't seem to have any qualms packaging 3.3.0, nor 3.3.1.
 
If they don’t have a rock solid argument, and even if they
have, there may not be anymore official R packages from big
league binary distros. Unless they patch R, their legal arm
will forbid it.

I'm not aware of any such move on Debian.

So either they will stop distribute R or they will patch
en-masse.

Somehow, I doubt it.

Again, why not ask the R core team how they reconcile OpenSSL and GPL'd R ? They are a much larger group than we are ; I doubt that the "IANAL" holds in their case : if they had any doubts, they probably have consulted people who AAL.

--
Emmanuel Charpentier

Jean-Pierre Flori

unread,
Oct 27, 2016, 9:59:51 AM10/27/16
to sage-devel
Building R against a ssl-less libcurl works (modulo patching configure) and does not seem to add errors into R's test suite.
So I suggest the following:
[123]:
* add curl as a standard package and let it use ssl if present except when making dist tarballs (ticket needs some tweaking in case of dist tarballs).
* patch R unconditionally and build it against the potentially ssl-free curl (patch is ready).

Jean-Pierre Flori

unread,
Oct 27, 2016, 10:04:52 AM10/27/16
to sage-devel


On Thursday, October 27, 2016 at 3:59:26 PM UTC+2, Emmanuel Charpentier wrote:


Le jeudi 27 octobre 2016 13:10:17 UTC+2, François a écrit :
It is a most interesting point because it explain why
the R binary installed from epel for RH7.1 and family
isn’t linked to openssl.

How old is RH7.1 ? The introduction of the libcurl requirement dates from the R 3.3.0 release, (May 3, 2016).

I note that Debian (notoriously twitchy about license issues) didn't seem to have any qualms packaging 3.3.0, nor 3.3.1.
 
I'd say Debian links curl to GnuTLS.

Emmanuel Charpentier

unread,
Oct 27, 2016, 10:49:23 AM10/27/16
to sage-devel

Probably not : My system's curl-config --protocols says that HTTPS: is supported. A trial of curl's configure --without-ssl --with-gnutls does not claim to support it (in the summary printed at th end of ./configure...).

However, I'm not sure : Debian's blurb about libcurl3-gnutls says that it supports https, imaps, ldaps, smtps and pop3s.

To be sure, I'll have to setup a virtual machine with a minimal environment (no openssl, of course) and try to setup libcurl (and, if successful, Sage) in that environment. This is time consuming, so not quite real soon...

--
Emmanuel Charpentier
 

Jean-Pierre Flori

unread,
Oct 27, 2016, 10:59:02 AM10/27/16
to sage-devel


On Thursday, October 27, 2016 at 4:49:23 PM UTC+2, Emmanuel Charpentier wrote:


Le jeudi 27 octobre 2016 16:04:52 UTC+2, Jean-Pierre Flori a écrit :


On Thursday, October 27, 2016 at 3:59:26 PM UTC+2, Emmanuel Charpentier wrote:


Le jeudi 27 octobre 2016 13:10:17 UTC+2, François a écrit :
It is a most interesting point because it explain why
the R binary installed from epel for RH7.1 and family
isn’t linked to openssl.

How old is RH7.1 ? The introduction of the libcurl requirement dates from the R 3.3.0 release, (May 3, 2016).

I note that Debian (notoriously twitchy about license issues) didn't seem to have any qualms packaging 3.3.0, nor 3.3.1.
 
I'd say Debian links curl to GnuTLS.

Probably not : My system's curl-config --protocols says that HTTPS: is supported. A trial of curl's configure --without-ssl --with-gnutls does not claim to support it (in the summary printed at th end of ./configure...).

Are you sure you  installed the gnutls dev headers?
GnuTLS provides an SSL implem, not just SSL.
 
However, I'm not sure : Debian's blurb about libcurl3-gnutls says that it supports https, imaps, ldaps, smtps and pop3s.

To be sure, I'll have to setup a virtual machine with a minimal environment (no openssl, of course) and try to setup libcurl (and, if successful, Sage) in that environment. This is time consuming, so not quite real soon...
You mean with gnutls?
Or with no ssl/tls at all?

I already did the latter and it works (modulo hacking R's configure).

Jean-Pierre Flori

unread,
Oct 27, 2016, 11:00:05 AM10/27/16
to sage-devel

 (sage-sh) jpflori@gcc1-power7:sage.git$ curl-config --protocols
DICT
FILE
FTP
GOPHER
HTTP
IMAP
POP3
RTSP
SCP
SFTP
SMTP
TELNET
TFTP
(sage-sh) jpflori@gcc1-power7:sage.git$ R --version
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: powerpc64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under the terms of the
GNU General Public License versions 2 or 3.
For more information about these matters see
http://www.gnu.org/licenses/.

Jean-Pierre Flori

unread,
Oct 27, 2016, 11:16:02 AM10/27/16
to sage-devel
But you're right, by default Debian links to openssl: https://packages.debian.org/sid/libcurl3

And indeed curl is not GPL anyway: https://curl.haxx.se/docs/copyright.html
Groumpf.
See https://curl.haxx.se/legal/distro-dilemma.html for more rumbling.

Now what if a GPL application links to libcurl?
Section 6 of https://curl.haxx.se/docs/faq.html tells its a legal issue and they cannot help you.

Emmanuel Charpentier

unread,
Oct 27, 2016, 11:17:24 AM10/27/16
to sage-devel


Le jeudi 27 octobre 2016 16:59:02 UTC+2, Jean-Pierre Flori a écrit :


On Thursday, October 27, 2016 at 4:49:23 PM UTC+2, Emmanuel Charpentier wrote:


Le jeudi 27 octobre 2016 16:04:52 UTC+2, Jean-Pierre Flori a écrit :


On Thursday, October 27, 2016 at 3:59:26 PM UTC+2, Emmanuel Charpentier wrote:


Le jeudi 27 octobre 2016 13:10:17 UTC+2, François a écrit :
It is a most interesting point because it explain why
the R binary installed from epel for RH7.1 and family
isn’t linked to openssl.

How old is RH7.1 ? The introduction of the libcurl requirement dates from the R 3.3.0 release, (May 3, 2016).

I note that Debian (notoriously twitchy about license issues) didn't seem to have any qualms packaging 3.3.0, nor 3.3.1.
 
I'd say Debian links curl to GnuTLS.

Probably not : My system's curl-config --protocols says that HTTPS: is supported. A trial of curl's configure --without-ssl --with-gnutls does not claim to support it (in the summary printed at th end of ./configure...).

Are you sure you  installed the gnutls dev headers?
GnuTLS provides an SSL implem, not just SSL.

Yep :
dpkg -l "*tls*" | grep ii
ii  libcurl3-gnutls:amd64     7.50.1-1     amd64        easy-to-use client-side URL transfer library (GnuTLS flavour)
ii  libcurl4-gnutls-dev:amd64 7.50.1-1     amd64        development files and documentation for libcurl (GnuTLS flavour)


However, on the very same machine :
dpkg -l "*open*ssl*" | grep ii
ii  libgnutls-openssl27:amd64 3.5.5-2      amd64        GNU TLS library - OpenSSL wrapper
ii  openssl                   1.0.2j-1     amd64        Secure Sockets Layer toolkit - cryptographic utility
ii  python-openssl            16.1.0-1     all          Python 2 wrapper around the OpenSSL library


(Don't tell me it's inconsistent; I know it : this machine has been updated for 5 years without any reinstallation ; since I follow "testing" a bit closely, I tend to accumulate cruft...).

Again, I'm not convinced that a serious test on a clean virtual machine is the only way to be sure...)

 
However, I'm not sure : Debian's blurb about libcurl3-gnutls says that it supports https, imaps, ldaps, smtps and pop3s.

To be sure, I'll have to setup a virtual machine with a minimal environment (no openssl, of course) and try to setup libcurl (and, if successful, Sage) in that environment. This is time consuming, so not quite real soon...
You mean with gnutls?
Or with no ssl/tls at all?

I mean a pristine machine (Debian gnome + a browser + minimal development utilities  + Sage's minimal requirements + gnutls). Given my pipe to the net and my (home) machine, that might take a while.

I already did the latter and it works (modulo hacking R's configure).

Are you able to install a package from Sage's R ?

Emmanuel Charpentier

unread,
Oct 27, 2016, 11:18:40 AM10/27/16
to sage-devel

No HTTPS. Damn..

Jean-Pierre Flori

unread,
Oct 27, 2016, 11:23:58 AM10/27/16
to sage-devel
As far as installing packages goes, R suggest me 27 https mirrors and a 28-th option "HTTP mirrors", if I go there it works.

William Stein

unread,
Oct 27, 2016, 11:23:58 AM10/27/16
to sage-devel
Hi,

<useless rant>

We've been down this road before with Sage, and it's pretty annoying.
I've personally wasted hundreds of hours on it (GNUtls, openssl, etc.)
Programmers playing lawyers have ended up with a broken and
inconsistent legal foundation. There is no easy way out, since only
copyright owners can change licenses. Because this is volunteer open
source, much of the generation that got us into this mess is MIA (or
even dead in some cases). Sigh...

</useless rant>

-- William
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.



--
William (http://wstein.org)

Jean-Pierre Flori

unread,
Oct 27, 2016, 11:25:51 AM10/27/16
to sage-devel


On Thursday, October 27, 2016 at 5:16:02 PM UTC+2, Jean-Pierre Flori wrote:
But you're right, by default Debian links to openssl: https://packages.debian.org/sid/libcurl3

And indeed curl is not GPL anyway: https://curl.haxx.se/docs/copyright.html
Groumpf.
See https://curl.haxx.se/legal/distro-dilemma.html for more rumbling.

Apart from the linking issue, doesn't it apply that we cannot ship curl sources?

I seem to remember that stuff about Sage already violating the GPL as it is GPL v3+ and ships some GPL v2 only code.
No idea about the implication for LGPL and other funny stuff.

Jean-Pierre Flori

unread,
Oct 27, 2016, 11:27:03 AM10/27/16
to sage-devel


On Thursday, October 27, 2016 at 5:23:58 PM UTC+2, William wrote:
Hi,

<useless rant>

We've been down this road before with Sage, and it's pretty annoying.
I've personally wasted hundreds of hours on it (GNUtls, openssl, etc.)
 Programmers playing lawyers have ended up with a broken and
inconsistent legal foundation. There is no easy way out, since only
copyright owners can change licenses.  Because this is volunteer open
source, much of the generation that got us into this mess is MIA (or
even dead in some cases).    Sigh...

</useless rant>

Yup it's a uselessly messy situation :/
I certainly do not want to play lawyers.

William Stein

unread,
Oct 27, 2016, 11:28:41 AM10/27/16
to sage-devel
On Thu, Oct 27, 2016 at 8:25 AM, Jean-Pierre Flori <jpf...@gmail.com> wrote:
>
>
> On Thursday, October 27, 2016 at 5:16:02 PM UTC+2, Jean-Pierre Flori wrote:
>>
>> But you're right, by default Debian links to openssl:
>> https://packages.debian.org/sid/libcurl3
>>
>> And indeed curl is not GPL anyway:
>> https://curl.haxx.se/docs/copyright.html
>> Groumpf.
>> See https://curl.haxx.se/legal/distro-dilemma.html for more rumbling.
>>
> Apart from the linking issue, doesn't it apply that we cannot ship curl
> sources?
>
> I seem to remember that stuff about Sage already violating the GPL as it is
> GPL v3+ and ships some GPL v2 only code.

Such as? I don't think this is the case.


--
William (http://wstein.org)

Jean-Pierre Flori

unread,
Oct 27, 2016, 11:34:08 AM10/27/16
to sage-devel

kcrisman

unread,
Oct 27, 2016, 2:15:38 PM10/27/16
to sage-devel

So either they will stop distribute R or they will patch
en-masse.

Somehow, I doubt it.


Probably nobody even bothered to notice or notify e.g. Debian?

Thanks for working on this; how annoying. 

Jean-Pierre Flori

unread,
Oct 28, 2016, 5:19:20 AM10/28/16
to sage-devel


On Thursday, October 27, 2016 at 8:15:38 PM UTC+2, kcrisman wrote:

So either they will stop distribute R or they will patch
en-masse.

Somehow, I doubt it.


Probably nobody even bothered to notice or notify e.g. Debian?

I think people at Debian ar well aware of such cases.
Not sure what they'll do about it.

Yet another interesting discussion for freeradius, they do add a specific exception for openssl...
* https://bugzilla.redhat.com/show_bug.cgi?id=317271

Thanks for working on this; how annoying. 
Enough for me, I'm gonna stop here :)

Emmanuel Charpentier

unread,
Oct 28, 2016, 6:33:42 AM10/28/16
to sage-devel
I just checked (by installation on a virtual machine) that a *virgin* (base + destktop + usual utilities) debian stable (jessie) has openssl installed. Tentatively asking for its removal (apt-get remove -s openssl) tells that it would remove a ton of system utilities.

The same is true on testing (stretch).

Either Debian isn't aware of the situation (which would greatly surprise me after all this time) or ... the situation is not as bad as it looks to some.

More on this to follow : we ave several solutions (from loopholes to triumphal arches...).

--
Emmanuel Charpentier

Jean-Pierre Flori

unread,
Oct 28, 2016, 7:13:10 AM10/28/16
to sage-devel


On Friday, October 28, 2016 at 12:33:42 PM UTC+2, Emmanuel Charpentier wrote:
I just checked (by installation on a virtual machine) that a *virgin* (base + destktop + usual utilities) debian stable (jessie) has openssl installed. Tentatively asking for its removal (apt-get remove -s openssl) tells that it would remove a ton of system utilities.

desktop :)

Emmanuel Charpentier

unread,
Oct 28, 2016, 8:23:23 AM10/28/16
to sage-devel

Indeed. But that's a realistic use case...

Emmanuel Charpentier

unread,
Oct 28, 2016, 9:39:08 AM10/28/16
to sage-devel
My thoughts so far :

I : Is there really a problem ?
=====================

What all the brouhaha around libcurl boils down to is that there *might* be a (pseudo)-legal difficulty in shipping a libcurl liibrary requiring OpenSSL and a GPL-licensed piece of software *in the same package*. This might be a part of the reasons for the R core team to have thrown the towel on this (but probably only patr of the reason : they alo threw the towel on xz an pcre, which do not give the same headache).

However, this does not seem to be a problem per se : Debian (one of the most nitpicking distros in terms of licensing)   happily ships libraries and utilities (such as cups, for starter) linked with openssl-linked libcurl. I think that it would be interesting to ask them how they reconcile the (inconsistent) wordings of the licenses involved.

According to their answer, we might have an easy way out : hide behind the same legal curtain as Debian (it remains to see what it is...), package libcurl (essentially done) and silently ship it with Sage (it remains to check if other libraries are not more or less silently involved in the support of https: in libcurl, in which case we might have to use them also). This is option :

a) Do nothing :
--------------------

II ? If there is really a problem, what can we do ?
===================================

We might also bite the bullet, modify our licensing terms to add the advertising clause required by openssl's license and ship openly libcurl. Tha requires, it seems, explicit permission of all the people havng contributed to Sage, which might prove a difficult (impossble ?) task. That gives us option :

b) Acknowledge libcurl
---------------------------------

We can also emulate the R core team, throw the towel and simply add (an https-capable) libcurl to our initial requirements in README.md, possibly other places), leaving the user with the responsibility of installing it. This is option :

c) Throw the towel
---------------------------

Another possibility in the same vein is to throw the whole linen cabinet : instead of placing on user's shoulders the responsibility of finding libcurl, we might leave it the responsibility of installing R. This is made possible by the fact that R is now largely stable, with well-documented interfaces and few changes, therefore standardizable. A review of *all* the points of exchange between R and other parts of Sage would be necessary to check what is to be supported. As far as I know, R is sparsely used in "the rest of Sage. This is option :

d) Excise R kernel
---------------------------

At that point, one might wonder if R should remain a standard part of Sage. Dropping the requirement for T and making R interfaces an optional part of Sage might also be a solution. But this is possible if and only if the code review necessary to Sage excision shows no use of R's capabilities in other standard part of Sage. This is option :

e) Excise R interfaces
--------------------------------

I think that we can forget about creating a network-deprived R : the resulting loss of functionality is so massive that it would become almost useless (to people having a use for R, that is...). I have to add that I would fight such a "solution"...

III : Pros and contras
------------------------------

"Throw the towel" is the laziest option : a few lines of not hard-to-write documentation in a few (harder-to-find ?) places. It buys us nothing in terms of functionality. And leaves us with the responsibility of updating R (a not-so-insignificant task) and large sources, libraries and executables.

"Do nothing" is (almost but not quite) as lazy: porting libcurl is essentially done ; it remains to check if other libraries are required to build an https:-capable libcurl. No other benefits.

"Acknowledge libcurl" seems almost infeasible, due to the necessity to hunt all the past and present Sage contributors. It would be otherwise the cleanest solution in the eyes of legal-oriented people.

"Excise R kernel" needs a serious bit of work. But it would have its points : document all the uses of R from other parts of Sage, forcing the documentation of these uses, etc... It would also lighten the maintenance of Sage. However, we would be exposed to brutal loss of functionality if/when R changes without warning. Furthermore, paranoid users (such as me :-) would not have to maintain "system" and "Sage" installations of R (not a small task with litteraly thousands of R packages available...).

"Excise R interfaces" is probably easy to do (modulo the code review necessary to excision) ; in my not so humble opinion, it would be a serious loss of interest for me and, more generally, a catastrophic mistake in communication : R has been part of Sage since version 3.0 (2008) (if Wikipedia is to be believed), and it would be the first ever *intentional* loss of functionality of Sage. Furthermore, I am a bit skeptic about R interfaces maintenance if they ever becom an optional part of Sage : even the (Sage) notebook, which is pretty central,  has attracted cruft to the point of becoming unmaintainable...

My short-sighted laziness would go to "Throw the towel" ; my long-term laziness would choose "Excise R kernel" (it could be the former now, the latter afterwards). However, notwithstanding its drawbacks, "do nothing" is almost done.


What do you think ?

--
Emmanuel Charpentier

Julien Puydt

unread,
Oct 28, 2016, 9:42:47 AM10/28/16
to sage-...@googlegroups.com
Hi,

On 28/10/2016 15:39, Emmanuel Charpentier wrote:

> However, this does not seem to be a problem per se : Debian (one of the
> most nitpicking distros in terms of licensing) happily ships libraries
> and utilities (such as cups, for starter) linked with openssl-linked
> libcurl. I think that it would be interesting to ask them how they
> reconcile the (inconsistent) wordings of the licenses involved.

It's easy to ask:

https://www.debian.org/legal/

Snark on #debian-science

Volker Braun

unread,
Oct 28, 2016, 9:58:42 AM10/28/16
to sage-devel
I think you are making it more difficult than it is. I'm pretty sure our binaries already depend on openssl being installed, and we do this under the GPL system library exception. We just can't ship our own openssl (nor would I want to).

So we may just as well include libcurl, linked to the system openssl. 

The caveat of including libcurl is that neither curl nor openssl include the root certs, so to actually download over https:// on OSX you need those as well.

The other caveat of including libcurl is that it (by default) detects and links to everything under the sun, which we would have to disable somehow when building with SAGE_FAT_BINARY=yes

 $ ldd `which curl`
linux-vdso.so.1 (0x00007ffcd5c7e000)
libcurl.so.4 => /lib64/libcurl.so.4 (0x00007f0e6e644000)
libmetalink.so.3 => /lib64/libmetalink.so.3 (0x00007f0e6e435000)
libssl3.so => /lib64/libssl3.so (0x00007f0e6e1e8000)
libsmime3.so => /lib64/libsmime3.so (0x00007f0e6dfc1000)
libnss3.so => /lib64/libnss3.so (0x00007f0e6dc98000)
libnssutil3.so => /lib64/libnssutil3.so (0x00007f0e6da69000)
libplds4.so => /lib64/libplds4.so (0x00007f0e6d865000)
libplc4.so => /lib64/libplc4.so (0x00007f0e6d660000)
libnspr4.so => /lib64/libnspr4.so (0x00007f0e6d420000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f0e6d204000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007f0e6d000000)
libz.so.1 => /lib64/libz.so.1 (0x00007f0e6cde9000)
libc.so.6 => /lib64/libc.so.6 (0x00007f0e6ca27000)
libnghttp2.so.14 => /lib64/libnghttp2.so.14 (0x00007f0e6c806000)
libidn.so.11 => /lib64/libidn.so.11 (0x00007f0e6c5d1000)
libssh2.so.1 => /lib64/libssh2.so.1 (0x00007f0e6c3a3000)
libpsl.so.5 => /lib64/libpsl.so.5 (0x00007f0e6c196000)
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007f0e6bf48000)
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007f0e6bc62000)
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007f0e6ba31000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007f0e6b82c000)
liblber-2.4.so.2 => /lib64/liblber-2.4.so.2 (0x00007f0e6b61d000)
libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00007f0e6b3cc000)
libexpat.so.1 => /lib64/libexpat.so.1 (0x00007f0e6b19f000)
librt.so.1 => /lib64/librt.so.1 (0x00007f0e6af97000)
/lib64/ld-linux-x86-64.so.2 (0x0000557b08980000)
libssl.so.10 => /lib64/libssl.so.10 (0x00007f0e6ad24000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007f0e6a8c4000)
libunistring.so.2 => /lib64/libunistring.so.2 (0x00007f0e6a594000)
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007f0e6a384000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007f0e6a180000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007f0e69f66000)
libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007f0e69d48000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007f0e69b21000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007f0e698ea000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007f0e69677000)
libfreebl3.so => /lib64/libfreebl3.so (0x00007f0e69474000)

Jean-Pierre Flori

unread,
Oct 28, 2016, 10:07:16 AM10/28/16
to sage-devel


On Friday, October 28, 2016 at 3:58:42 PM UTC+2, Volker Braun wrote:
I think you are making it more difficult than it is. I'm pretty sure our binaries already depend on openssl being installed, and we do this under the GPL system library exception. We just can't ship our own openssl (nor would I want to).
Yup that is most distro trick: considering openssl is part of the system lib of the operating system and invoke the corresponding GPL exception to link GPL stuff to it.
 
If our binaries already depends on openssl being installed, which I don't find really satisfactory, let it be and sotp the madness.
Not that if we want to ship binaries without this dependency it is not that hard (conifgure curl properly and a 10 line patch to R which won't support https).

William Stein

unread,
Oct 28, 2016, 11:03:26 AM10/28/16
to sage-devel
Hi,

Regarding the openssl dependency issue, the standard way people
justify getting around it is the "system library exemption", which
allows for GPL'd programs to link in system libraries that are not
GPL'd (otherwise, things like GPL software on MS Windows would be
impossible!). Some links here:

http://opensource.stackexchange.com/questions/2233/gpl-v3-with-openssl-exception

As the person who chose to add R to Sage in the first place, my
instinct on this is that we should **completely and totally remove R
from Sage**. Why?

- Our pexpect based interface to R sucks. It was mostly written by
Mike Hansen and me, so I take the blame. In SageMathCloud Sage
worksheets
we just switched to making the %r mode be implemented using Jupyter's
R kernel, which is way more robust.

- It's easy enough to install R in other ways outside of Sage.
I've heard of a lot of people installing Sage in order to install X
(where X is say Pari or Singular or even Cython at one point); I've
*never* heard of anybody installing Sage in order to get R.

- The main technical reason for installing R into Sage, as opposed
to just finding a system-wide R install, is to ensure that rpy2 -- the
C-level bindings to R -- actually work. However, in retrospect, I
don't think rpy2 is really that great.

- Python stats have come a *LONG* way in the last 10 years, with
libraries like Pandas. Why use rpy2 when you can much more
effectively use pandas and statsmodels and so on.

In my opinion, it would be way, way better to completely remove R from
Sage and instead do the following:

1. Include the R jupyter kernel config files.

2. Includes the modern Python stats libraries pandas and
statsmodels in Sage.

Our time would be much better spent supporting 2 than 1. It's
ridiculous that we spend no effort on pandas/statsmodels, and all this
effort on R. That was a strategy that made sense 10 years ago, but
not today.

For example, I recall that there are some issues involving pandas +
statsmodels + the sage preparser. We could put effort into
addressing those, like Robert Bradshaw did with numpy (which used to
be very unhappy with Sage integers, reals, etc.). Fixing this stuff
probably wouldn't be hard, and would make Sage a better environment
for stats. There may be similar remarks around machine learning,
where Python has really come into its own recently (e.g., see
tensorflow).

Anyway, just my two cents. But if anybody was out there wanting to
propose something similar, but worried that the person who included R
in Sage in the first place would be really annoyed -- fear not.

-- William

Dima Pasechnik

unread,
Oct 28, 2016, 12:38:20 PM10/28/16
to sage-devel


On Friday, October 28, 2016 at 3:03:26 PM UTC, William wrote:
Hi,

Regarding the openssl dependency issue, the standard way people
justify getting around it is the "system library exemption", which
allows for GPL'd programs to link in system libraries that are not
GPL'd (otherwise, things like GPL software on MS Windows would be
impossible!).   Some links here:

     http://opensource.stackexchange.com/questions/2233/gpl-v3-with-openssl-exception

As the person who chose to add R to Sage in the first place, my
instinct on this is that we should **completely and totally remove R
from Sage**.  Why?

  - Our pexpect based interface to R sucks.  It was mostly written by
Mike Hansen and me, so I take the blame.  In SageMathCloud Sage
worksheets
we just switched to making the %r mode be implemented using Jupyter's
R kernel, which is way more robust.

  - It's easy enough to install R in other ways outside of Sage.
I've heard of a lot of people installing Sage in order to install X
(where X is say Pari or Singular or even Cython at one point); I've
*never* heard of anybody installing Sage in order to get R.

  - The main technical reason for installing R into Sage, as opposed
to just finding a system-wide R install, is to ensure that rpy2 -- the
C-level bindings to R -- actually work.

What would prevent it from working if R is not included in Sage?
rpy2 is pip-installable.

Doing "pip install rpy2" on my system python (3.5) gave me a running version of
rpy2 (talking to system-wide R 3.3.1).
I didn't try with python2, but they say it's supported this way too.

Just make rpy2 an optional Sage package...

Dima

William Stein

unread,
Oct 28, 2016, 1:24:35 PM10/28/16
to sage-devel
Since you ask, some of the things I can immediately think of:

1. R isn't installed systemwide (obvious)

2. The version of R that is installed is very old or too new compared
to the version of Sage and rpy2 that are installed.

3. The version of R that the user installed doesn't provide a shared
object library or dynamic link library (maybe on OS X or Windows?)

4. The user installed R in some local location, and there are shared
object libraries there, but Sage's Python knows nothing about this
copy of R or its libraries, so rpy2 won't build.

That said, I'm also for rpy2 being optional and pip installable, as
you propose. Many of the above problems are best addressed by
improving rpy2 and pip, and maybe that's already happened.

-- William

Matthias Koeppe

unread,
Oct 28, 2016, 5:34:34 PM10/28/16
to sage-devel
On Friday, October 28, 2016 at 8:03:26 AM UTC-7, William wrote:
we should **completely and totally remove R
from Sage**.  

We could also demote the R package to "optional" or "experimental" status.
I'd support that.
 

Nathan Dunfield

unread,
Oct 29, 2016, 1:09:13 AM10/29/16
to sage-devel
It's ridiculous that we spend no effort on pandas/statsmodels, and all this
effort on R.  

+1
 
For example, I recall that there are some issues involving pandas +
statsmodels + the sage preparser.  

I use Pandas in the default Sage Interpreter on a daily basis and have only encountered a few quirks, all I think related to the Sage Integer vs. Python Integer distinction.  I don't use statsmodel though.  

Nathan

Dima Pasechnik

unread,
Oct 29, 2016, 3:03:32 PM10/29/16
to sage-devel
I talked to R people here at GSoC summit, and they say it should not be a problem to have rpy2 install correctly on
any platform they support (the libraries rpy2 needs are always bundled, and the non-standard location of R is supported too.)

Emmanuel Charpentier

unread,
Oct 30, 2016, 4:19:04 AM10/30/16
to sage-devel
OK. It seems that a clear consensus exists for the excision of R _proprio dictu_ from Sage.

If we break it, can we at least keep the (Sage's) pieces ? An optional package offering the current R interface(s) depending  on a systemwide(at least user-reachable) version of R and the R library might offer the functionality without entailing the (not so trivial) work of maintaining our own R port.

Do you have an idea of the amount of work involved ? And how to, do it ? The rpy2 part is probably easy, but I expect surprises on the pexpec(t) front (at least if we want to keep compatibility with existing code using current R interface facilities...).  Any hint on the right starting point(s) would be welcome...

BTW : I have also had a (quite superficial) look at what pandas and statsmodel claim to do. They (and scikit-learn, which look interesting, and stan, already available from Python) might be indeed useful for run-of-the-mill descriptive analysis and regression models (and Stan for more exotic modeling). Packaging them for Sage might prove useful.

However, those 9000+ R packages are here for a reason : if some of them (a small minority, IMHO) are obvious PhD-earning byproducts, others aim to solve difficult (if specialized) problems, badly solved (or not even considered) by the aforementioned trio. keeping a way to reach them from Sage might be important. Hence my plea for an "R interface(s)" package.

Another (IMHO futile) reason for keeping an R interface in our arsenal is "name recognition" : quoting R in a "Materials and methods" section no longer raises questions from reviewers ; somehow, I doubt that pandas and statmodels get the same answer...

What can you add ? Can someone propose a work plan ?

--
Emmanuel Charpentier

William Stein

unread,
Oct 30, 2016, 12:10:31 PM10/30/16
to sage-devel
I'm not thinking about politics and name recognition below - I'm just
providing a technical/engineering perspective.

If anybody is seriously planning to work on the R pexpect interface, I
would personally suggest they consider instead working on the R
Jupyter kernel bridge instead, and maybe create a drop in replacement
for the current pexpect interface that uses the R Jupyter kernel.
This would likely be time well spent. This is what we (mostly Hal
Snyder) have been doing with SageMathCloud, where now the %r mode in
Sage worksheets is implemented entirely using Jupyter (which we did
due to user bug/robustness reports with the sage pexpect interface):

https://github.com/sagemathinc/smc/blob/master/src/smc_sagews/smc_sagews/sage_jupyter.py

Some history - I wrote those (stupid) pexpect interfaces in
2005-2006 as a way to bootstrap making Python talk to everything else.
However, they are basically the worst *viable* approach to the
problem. Jupyter kernels are potentially a lot more work than using
pexpect, but they are clearly a better approach.

Yes, using pexpect is one way to write a Jupyter kernel, but
fortunately there are better ways. For example, the Jupyter kernel is
1000(s) of lines of nontrivial code written in the R language, which
is being improved all the time due to extensive use by users. The
Jupyter R kernel is a serious actively developed project:

https://github.com/IRkernel/IRkernel

Compare that to the Sage R pexpect interface...

https://github.com/sagemath/sage/commits/master/src/sage/interfaces/r.py



-- William


--
William (http://wstein.org)

Emmanuel Charpentier

unread,
Oct 30, 2016, 3:09:43 PM10/30/16
to sage-devel
Dear William,

thanks for this advice, which I'll consider seriously, notwithstanding its total opacity to me at the moment....

I just checked that the r interface in SMC is indeed different from what exists in sage "standalone". It is also a bit difficult to understand, due to present lack of documentation. Cut-n'paste from a sagews :

help("r")
︡983aebe0-3f6e-46ff-9d40-ce8e153480d2︡{"stdout":"no Python documentation found for 'r'\n\n"}︡{"done":true}︡


But the behavior of R in % cells is crystal-clear. The first difficulty I have is to understand how to exchange data (or other structures) between R and Sage (which is the *whole* point of the exercise...).

Do you think that this R interface, specific to Sagemath Cloud, could be ported to Sage ? And do you think that t would involve R-version specific code ?

Another potential stumbling point it that the main part of the interface (the IRkernel package), has not (het) been accepted by CRAN, rendering its future ability questionable.

I plan to look also at what is offered by the Rpy2 interface It would take a bit of work to emulate the behavior of the pexpect nterface, but that might offer a good interim solution (at this time, I have *no* idea about what the IRkernel and its companion libraries are supposed to do).

Last recourse : the R library itself. Again, I don't know (yet) what it offers.

Again, thank you for your (a bit frightening...) advice,

--
Emmanuel Charpentier

William Stein

unread,
Oct 30, 2016, 3:33:01 PM10/30/16
to sage-devel
On Sun, Oct 30, 2016 at 12:09 PM, Emmanuel Charpentier
<emanuel.c...@gmail.com> wrote:
> Dear William,
>
> thanks for this advice, which I'll consider seriously, notwithstanding its
> total opacity to me at the moment....
>
> I just checked that the r interface in SMC is indeed different from what
> exists in sage "standalone". It is also a bit difficult to understand, due
> to present lack of documentation. Cut-n'paste from a sagews :
>
> help("r")
> ︡983aebe0-3f6e-46ff-9d40-ce8e153480d2︡{"stdout":"no Python documentation
> found for 'r'\n\n"}︡{"done":true}︡
>
> But the behavior of R in % cells is crystal-clear. The first difficulty I
> have is to understand how to exchange data (or other structures) between R
> and Sage (which is the *whole* point of the exercise...).
>
> Do you think that this R interface, specific to Sagemath Cloud, could be
> ported to Sage ? And do you think that t would involve R-version specific
> code ?

I don't think this question really makes any sense. What you have to
do is simply learn Jupyter... then rethink the question. Except
that's harder than it should be...! I just posted this to
https://gitter.im/jupyter/jupyter: "Hi, I just spent a shockingly
large amount of time searching and googling for basically an overview
of what a Jupyter kernel is and how to write one. There's some old
deprecated docs about this, which all have big pointers to newer docs.
I can't find any new docs that simply explain what a Jupyter kernel is
and how to write one. I'm trying to convince other Sage developers to
use Jupyter kernels instead of the pexpect interfaces I wrote for Sage
-- at least for R mode -- and it's impossible to convince them if I
can't even point to the docs. So help... For example, going here, I
would expect under "Kernels" the docs I want, but I can't find
anything."

>
> Another potential stumbling point it that the main part of the interface
> (the IRkernel package), has not (het) been accepted by CRAN, rendering its
> future ability questionable.
>
> I plan to look also at what is offered by the Rpy2 interface It would take a
> bit of work to emulate the behavior of the pexpect nterface, but that might
> offer a good interim solution (at this time, I have *no* idea about what the
> IRkernel and its companion libraries are supposed to do).
>
> Last recourse : the R library itself. Again, I don't know (yet) what it
> offers.
>
> Again, thank you for your (a bit frightening...) advice,

Sorry for being scary. It's just advice. Feel free to ignore, of course.

William Stein

unread,
Oct 30, 2016, 4:04:39 PM10/30/16
to sage-devel
--
Sent from my massive iPhone 6 plus.

Emmanuel Charpentier

unread,
Oct 30, 2016, 4:18:10 PM10/30/16
to sage-devel
Dear William,

Sorry to have sounded frightened by *you* : I'm frightened by the amount of *my* ignorance...

Since it seems that I'm (almost) alone among Sage users to be interested by the development of an interface to R, I'm exploring the options. And I'm trying to understand things I was used to *use* (as you use your car without thinking about the thousands of engeenering problems that had to be solved to make it "just run"). So I try to list all the possible solutions and ranking them also by a very personal criterion : the amount of knew knowledge I will need to use them... Your description of Jupyter kernels makes me put it at the extremity of my list...

You do not have to convince me that pexpect is a poor way to do it : I'm convinced : the data interchange involves a serious amount of playing fast and loose with Python and R dumps. Not quite steamlined...

Sincerely,

--
Emmanuel Charpentier

Le jeudi 27 octobre 2016 10:03:03 UTC+2, Jean-Pierre Flori a écrit :
Hi all,

The latest R versions depends on libcurl and actually more than that: on a libcurl with https support.
So we might want to build our own libcurl with https support (see #21767) but we then need an SSL/TLS implementation which Sage curretnly provides only optionally through openSSL because of license issues so we can:
[1] either make R depend on libcurl depend on openssl and they all become optional,
[2] or make R depend on libcurl and make them standard and add an SSL/TLS implementation and its development headers a prereq,
[3] or make libcurl with https support (and development headers) a prereq, which basically means adding an SSL/TLS implementation as a prereq as well,
[4] or make R a prereq,
[5] or drop R support,
[6] or patch R not to use curl,
[7] or patch R to use curl but without https support,
[8] or wait until the end of times,
[9] or a mix of all of this,
[10] or do something else.

What do you think?

Best,
JPF

William Stein

unread,
Oct 30, 2016, 6:02:35 PM10/30/16
to sage-...@googlegroups.com


On Sunday, October 30, 2016, Emmanuel Charpentier <emanuel.c...@gmail.com> wrote:
Dear William,

Sorry to have sounded frightened by *you* : I'm frightened by the amount of *my* ignorance...

Since it seems that I'm (almost) alone among Sage users to be interested by the development of an interface to R, I'm exploring the options. And I'm trying to understand things I was used to *use* (as you use your car without thinking about the thousands of engeenering problems that had to be solved to make it "just run"). So I try to list all the possible solutions and ranking them also by a very personal criterion : the amount of knew knowledge I will need to use them... Your description of Jupyter kernels makes me put it at the extremity of my list...

You do not have to convince me that pexpect is a poor way to do it : I'm convinced : the data interchange involves a serious amount of playing fast and loose with Python and R dumps. Not quite steamlined...

For data interchange I think rpy2 will be (by far) the optimal approach. 

William

 


Sincerely,

--
Emmanuel Charpentier

Le jeudi 27 octobre 2016 10:03:03 UTC+2, Jean-Pierre Flori a écrit :
Hi all,

The latest R versions depends on libcurl and actually more than that: on a libcurl with https support.
So we might want to build our own libcurl with https support (see #21767) but we then need an SSL/TLS implementation which Sage curretnly provides only optionally through openSSL because of license issues so we can:
[1] either make R depend on libcurl depend on openssl and they all become optional,
[2] or make R depend on libcurl and make them standard and add an SSL/TLS implementation and its development headers a prereq,
[3] or make libcurl with https support (and development headers) a prereq, which basically means adding an SSL/TLS implementation as a prereq as well,
[4] or make R a prereq,
[5] or drop R support,
[6] or patch R not to use curl,
[7] or patch R to use curl but without https support,
[8] or wait until the end of times,
[9] or a mix of all of this,
[10] or do something else.

What do you think?

Best,
JPF



 

--
You received this message because you are subscribed to the Google Groups "sage-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sage-devel+unsubscribe@googlegroups.com.

To post to this group, send email to sage-...@googlegroups.com.
Visit this group at https://groups.google.com/group/sage-devel.
For more options, visit https://groups.google.com/d/optout.


--

kcrisman

unread,
Oct 31, 2016, 9:58:14 PM10/31/16
to sage-devel
All I know is this:

1) We have "sold" a lot of Sage by saying it's all in there, and at least some people have used Sage+R effectively.  Estimates of how many vary wildly.  But non-zero.
2) rpy2 might be there, but as far as I can tell most people who've used Sage+R use it via the "dumb pexpect interface".
3) All those thousands of packages are darn useful and can't be easily replaced with pandas or anything else.
4) kernels are nice but how do you get Sage command line to do that?  I personally would only know how to do it, sort of, in Jupyter notebook, and not at all clear how to send stuff back and forth between Sage and R there.

My conclusion:

1) Definitely don't remove R from Sage, unless we go back on the mission statement.  We want to be able to easily use all those extra packages with algebra and graph theory.  (Not to mention current users.)
2) Finding a way to deprecate (!!!) current R behavior is viable, but should be a longer-than-normal period.  Including graphics and other stuff.  If rpy2 is the answer, great; I recall it being deemed not as satisfactory at some point but maybe no one ever tried since pexpect was "good enough".
3) Excellent idea to integrate pandas or whatever the current flavor of the month is far better!
4) But if no one is willing/able/whatever to make these new interfaces, keeping pexpect is definitely far better than simply jettisoning R.

(Note e.g. people who have used Sage cell server to embed R graphics in web pages - not sure how that would be affected, if it's a separate R process or via Sage?)

William Stein

unread,
Oct 31, 2016, 10:23:41 PM10/31/16
to sage-devel
On Mon, Oct 31, 2016 at 6:58 PM, kcrisman <kcri...@gmail.com> wrote:
> All I know is this:
>
> 1) We have "sold" a lot of Sage by saying it's all in there, and at least
> some people have used Sage+R effectively. Estimates of how many vary
> wildly. But non-zero.

Nobody is suggesting deprecating the potential to use Sage+R.

> 2) rpy2 might be there, but as far as I can tell most people who've used
> Sage+R use it via the "dumb pexpect interface".

rpy2 is vastly better than anything else when large amounts of data
need to be shared back and forth between Sage and R. It's the only
option in that case.

> 3) All those thousands of packages are darn useful and can't be easily
> replaced with pandas or anything else.

This is the same as 1.

> 4) kernels are nice but how do you get Sage command line to do that? I
> personally would only know how to do it, sort of, in Jupyter notebook, and
> not at all clear how to send stuff back and forth between Sage and R there.
>

In SageMathCloud we've been supporting actual users of Sage, R, etc.,
for a few years now. People frequently request that we install R
packages. As far as I can tell, 100% of the time they are using R
directly, via scripts, via %r mode in a notebook (either Sagews or
Jupyter). I've never once heard of anybody actually trying to use
Sage + R yet. As far as I can tell, Sage simply doesn't add anything
of value that people care or know about when they are using R... If
people are using Python to do stats they use the Python tools; if they
are using R, then R is already good enough for what they are doing,
and Sage doesn't add anything.

> My conclusion:
>
> 1) Definitely don't remove R from Sage, unless we go back on the mission
> statement.

The mission statement is to create a viable open source alternative to
Magma, Maple, Mathematica, and Matlab. Not including R in Sage does
not mean going back on that mission statement.

> We want to be able to easily use all those extra packages with
> algebra and graph theory. (Not to mention current users.)

> 2) Finding a way to deprecate (!!!) current R behavior is viable, but should
> be a longer-than-normal period. Including graphics and other stuff. If
> rpy2 is the answer, great; I recall it being deemed not as satisfactory at
> some point but maybe no one ever tried since pexpect was "good enough".

We could remove R from sage-7.5 and the current behavior would likely
work exactly as is (via pexpect) as long as R is installed somewhere
on the computer. In fact, things would be better, since all install
issues involving R itself would be the responsibility of the R
project.

> 3) Excellent idea to integrate pandas or whatever the current flavor of the
> month is far better!

pandas is not just some "flavor of the month" stats library. It's
been around since at least 2010, and is a sort of defacto standard at
this point. There's at least one Oreilly book on it.

> 4) But if no one is willing/able/whatever to make these new interfaces,
> keeping pexpect is definitely far better than simply jettisoning R.
> (Note e.g. people who have used Sage cell server to embed R graphics in web
> pages - not sure how that would be affected, if it's a separate R process or
> via Sage?)

Andrey would type "apt-get install R" (or whatever) and be done.


>
> --
> You received this message because you are subscribed to the Google Groups
> "sage-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sage-devel+...@googlegroups.com.
> To post to this group, send email to sage-...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sage-devel.
> For more options, visit https://groups.google.com/d/optout.



--
William (http://wstein.org)

Emmanuel Charpentier

unread,
Nov 1, 2016, 1:41:54 AM11/1/16
to sage-devel
A couple of quick notes ; consistent answer to follow (not soon, alas...) :


Le mardi 1 novembre 2016 03:23:41 UTC+1, William a écrit :
On Mon, Oct 31, 2016 at 6:58 PM, kcrisman <kcri...@gmail.com> wrote:
> All I know is this:
>
> 1) We have "sold" a lot of Sage by saying it's all in there, and at least
> some people have used Sage+R effectively.  Estimates of how many vary
> wildly.  But non-zero.

Nobody is suggesting deprecating the potential to use Sage+R.

That's replacing "It's there" by "It can be done". Not the same thing... unless you're a theologian or a politician.

The difficulty arises from lateral licensing issues, that apparently forces us not to ship all that is needed to rebuild "our" R. We could depend on this couple of libraries as being "part of the operating system" (and that's Debian and a couple other distros seem to do), but that complicates the requirements.

A first point is that It seems that a consensus goes to depending on R instead. More on this later.

A second, different, point is that we can't maintain the current R interface(s)  as standard parts of Sage unless either  R is part of Sage or a strict prerequisite (i. e. Sage won't build without it).

A third point, distinct from the previous two, is that William deems the current pexpect interface to be insufficient. Having suffered with it, I tend to concur. But I think that the pexpect() interface is *still* useful.

> 2) rpy2 might be there, but as far as I can tell most people who've used
> Sage+R use it via the "dumb pexpect interface".

rpy2 is vastly better than anything else when large amounts of data
need to be shared back and forth between Sage and R.  It's the only
option in that case.

Seconded, with one exception. Data *can* be kept and managed on the "Sage" side and sent to R thought the current pexpect interface (via a suitably built r.assign call). That loses the R facilities for data management. And let us stuck with the task of getting back R's results to sage via a *string*, of all things... It can be done (I've done it, at least in ad-hoceries), but it's not fun.

> 3) All those thousands of packages are darn useful and can't be easily
> replaced with pandas or anything else.

This is the same as 1.

> 4) kernels are nice but how do you get Sage command line to do that?  I
> personally would only know how to do it, sort of, in Jupyter notebook, and
> not at all clear how to send stuff back and forth between Sage and R there.
>

That's a fourth point, distinct from the three already stated. Having at least skimmed the documents retrieved by William, I can see his point, at leans from the systems integrator point of view.

But, IMHO, not what we want to see in Sage.

In SageMathCloud we've been supporting actual users of Sage, R, etc.,
for a few years now.  People frequently request that we install R
packages.  As far as I can tell, 100% of the time they are using R
directly, via scripts, via %r mode in a notebook (either Sagews or
Jupyter).

That's because that is the "easy" way to use Sage as a cloud-available R distribution, which covers a large majority of Sage Cloud users.

Not the same thing as using both R and the rest of Sage, which may be a less popular exercise, but exists.
 
 I've never once heard of anybody actually trying to use
Sage + R yet.

Can you hear me now ?
 
  As far as I can tell, Sage simply doesn't add anything
of value that people care or know about when they are using R...

That's probably true of people using SageMath Cloud as a cloud-installed R distribution. Not what I want to do...
 
If
people are using Python to do stats they use the Python tools; if they
are using R, then R is already good enough for what they are doing,
and Sage doesn't add anything.

It adds a lot : availability of tools much better at expressing *structures* than anything available on the R side (thonk graph theory, symbolic computation, etc...). That's no as popular among applied statisticians as computing "tests p-values" (the present), but remains the core problem of statistics ("given some data, what is the "right" structure that accounts for them ? "., i. e. the future...).

> My conclusion:
>
> 1) Definitely don't remove R from Sage, unless we go back on the mission
> statement.

The mission statement is to create a viable open source alternative to
Magma, Maple, Mathematica, and Matlab.  Not including R in Sage does
not mean going back on that mission statement.

> We want to be able to easily use all those extra packages with
> algebra and graph theory.  (Not to mention current users.)

Don't forget the "engeneering-type" and "undergraduate" users of Sage, who might not be number theorists, geometers or algebrists, but may finfd in Sage a *superior* replacement to Mathematica... This "user base" is often neglected in the discussions in sage-devel. I can think of a few bugs in elementary algebra/calculus that stand for years after being reported whithout any attempt to solve them.

> 2) Finding a way to deprecate (!!!) current R behavior is viable, but should
> be a longer-than-normal period.  Including graphics and other stuff.

R graphics don't display in the Jupyter notebook. That's why I updated "our" rpy2 (which does that correctly), this inadvertently creating a mess : the R instance created by rpy2 is not the same as the one created by pexpect. And cannot be, as far as I know. A possibility I wasstudying was to use rpy2's instance for our pexpect interface.

 If
> rpy2 is the answer, great; I recall it being deemed not as satisfactory at
> some point but maybe no one ever tried since pexpect was "good enough".

Agreed...

We could remove R from sage-7.5 and the current behavior would likely
work exactly as is (via pexpect) as long as R is installed somewhere
on the computer.  In fact, things would be better, since all install
issues involving R itself would be the responsibility of the R
project.

> 3) Excellent idea to integrate pandas or whatever the current flavor of the
> month is far better!

pandas is not just some "flavor of the month" stats library.   It's
been around since at least 2010, and is a sort of defacto standard at
this point.  There's at least one Oreilly book on it.

Hum.. De facto standard for what ?

From the pandas site

pandas consists of the following things

  • A set of labeled array data structures, the primary of which are Series and DataFrame
  • Index objects enabling both simple axis indexing and multi-level / hierarchical axis indexing
  • An integrated group by engine for aggregating and transforming data sets
  • Date range generation (date_range) and custom date offsets enabling the implementation of customized frequencies
  • Input/Output tools: loading tabular data from flat files (CSV, delimited, Excel 2003), and saving and loading pandas objects from the fast and efficient PyTables/HDF5 format.
  • Memory-efficient “sparse” versions of the standard data structures for storing data that is mostly missing or mostly constant (some fixed value)
  • Moving window statistics (rolling mean, rolling standard deviation, etc.)
  • Static and moving window linear and panel regression
An that's all. Data management, a whiff of vanilla descriptive statistics.

Data management is indeed an important (and quite time-consuming) part  of practical applied statistics. But that's not the whole game. Somehow lacking...

> 4) But if no one is willing/able/whatever to make these new interfaces,
> keeping pexpect is definitely far better than simply jettisoning R.

Agreed 100%
 
> (Note e.g. people who have used Sage cell server to embed R graphics in web
> pages - not sure how that would be affected, if it's a separate R process or
> via Sage?)

Andrey would type "apt-get install R" (or whatever) and be done.

How do you do that  in Cygwin ? And I don't even want to think about Macs...

Now, these remarks do not constitue a structured answer to the two latest posts. But I need time to think...

HTH,

--
Emmanuel Charpentier

Samuel Lelievre

unread,
Nov 1, 2016, 10:03:19 AM11/1/16
to sage-devel


Fri 2016-10-28 17:03:26 UTC+2, William:
 
   - Python stats have come a *LONG* way in the last 10 years, with
libraries like Pandas.   Why use rpy2 when you can much more
effectively use pandas and statsmodels and so on.

In my opinion, it would be way, way better to completely remove R from
Sage and instead do the following:

   1. Include the R jupyter kernel config files.

   2. Includes the modern Python stats libraries pandas and
statsmodels in Sage.

Our time would be much better spent supporting 2 than 1.   It's
ridiculous that we spend no effort on pandas/statsmodels, and all this
effort on R.  That was a strategy that made sense 10 years ago, but
not today.

For example, I recall that there are some issues involving pandas +
statsmodels + the sage preparser.   We could put effort into
addressing those, like Robert Bradshaw did with numpy (which used to
be very unhappy with Sage integers, reals, etc.).  Fixing this stuff
probably wouldn't be hard, and would make Sage a better environment
for stats.   There may be similar remarks around machine learning,
where Python has really come into its own recently (e.g., see
tensorflow).
 
These issues with Sage integers were recently illustrated
by this Ask Sage question:


If anyone knows how to fix that, it would be very appreciated.

kcrisman

unread,
Nov 1, 2016, 3:36:59 PM11/1/16
to sage-devel

Nobody is suggesting deprecating the potential to use Sage+R.

That's replacing "It's there" by "It can be done". Not the same thing... unless you're a theologian or a politician.


Thank you.
 

A third point, distinct from the previous two, is that William deems the current pexpect interface to be insufficient. Having suffered with it, I tend to concur. But I think that the pexpect() interface is *still* useful.


Correct.
 

 
 I've never once heard of anybody actually trying to use
Sage + R yet.

Can you hear me now ?
 

I definitely have, and I have heard from those who do.  Don't ask me for a detailed list.  But:

It adds a lot : availability of tools much better at expressing *structures* than anything available on the R side (thonk graph theory, symbolic computation, etc...). That's no as popular among applied statisticians as computing "tests p-values" (the present), but remains the core problem of statistics ("given some data, what is the "right" structure that accounts for them ? "., i. e. the future...).



Right, R is so much more than stats.  It's just ... everything you every wanted for processing all sorts of numerical data.  Often the only open-source way to deal with a lot of stuff is via some random R package.  And those things then would far easier be processed in R and sent back to Sage.   I am very excited about (someday) learning how to use pandas better, as a Python-native setup, but for practical purposes there is just too much stuff available with R for many people to just abandon it.
+++
Anyway, count this as a vote for at the very least maintaining the current R interface for continued compatibility for a longer period than the usual deprecation, though I think that changing that would really change what was available in Sage, for the poorer.  If the rpy2 roadblocks can be overcome, obviously so much the better.

Emmanuel Charpentier

unread,
Nov 1, 2016, 5:03:17 PM11/1/16
to sage-devel

Auto-casting ? That should be solved on Panda's side...

--
Emmanuel Charpentietr

Emmanuel Charpentier

unread,
Nov 1, 2016, 5:18:44 PM11/1/16
to sage-devel
Dear Karl-Dieter (IIRC, apologies if not...)

I saw your comments to my answer to William. It gives useful points, but I need more time to propose a consistent solution.

One more (side) point : I think that what is called currently statistics is currently evolving to become a discipline aiming at proposing and comparing unobservable structures explaining observed data (with much more emphasis on the probability than before) . From this point of view, the use of formal tools manipulating such structures is of great importance : logics (not necessarily Boolean), algebraic structures, graphs, etc.. will become important to the practicing applied statistician. Hence the importance of a toolbox enabling us to make *conjoint* use of both sets of tools.

And, oh, by the way, numerical analysis has not totally displaced formal methods : to be able to get formal solutions to an integration problem or a differential equations system is still damn important...

--
Emmanuel Charpentier

Emmanuel Charpentier

unread,
Nov 2, 2016, 6:19:52 AM11/2/16
to sage-devel
So far, the only plan I can propose which is consistent with the various points that have been stated  is :

I) short-term workaround : add a dependance for Sage, by :
a) depending on libcurl and its development files, or
b) depending on a systemwide R + its development files
II) solve current issues with the pexpexct interface (e. g. graphs in Jupyter...) and its documentation (e. g. use of RElement to retrieve an R object).

This solves the immediate problem at hand (which is urgent : more and more R packages won't install on R 3.2.x). But since some have pointed out the inefficiencies of the current R interface and its documentation, we could :

III) Build some steamlined interface ("newR") to R (via rpy2 ? via IRkernel ?). Test it
IV) Based on newR, build a reasonable wrapper emulating feature-for-feature the current pexpect interface. Test it.
V) replace the pexpect interface by the wrapper.

This is not urgent, and can be discussed further.

Now, for Ia vs Ib :
  • Both require modifying the installation documentation and the main configure file. These do not seem to be large modifications (I just need to learn a modicum of autotools in order to do this correctly).
  • Both allow the current pexpect to remain a standard part of Sage.
Ib seems to have two slight advantages on Ia :
  • R and its development files seem to be well-packaged on the few (Linux) distributions I have checked, as well as Cygwin, and probably are elsewhere (other GNU/Linuxes, assorted Unices). The Macintosh case, as usual, bears a large question mark...
  • Paradoxically, it is probably easier for an end-user to check this requirement than to check the suitability of his libcurl...
  • Ib does not require xz and pcre to become standard packages.
Could we have a vote on the I-II block (which *is* urgent), and on the Ia vs Ib alternative ?

HTH,

Jean-Pierre Flori

unread,
Nov 2, 2016, 6:40:30 AM11/2/16
to sage-devel
I would say we even have a shorter-time solution: close the tickets for including curl (modulo adding a SAGE_FAT_BINARY mode to avoid overlinking) and updating R (modulo adding a 6-line patch to not check for https support, note that most of the time https will be supported anyway).

Emmanuel Charpentier

unread,
Nov 2, 2016, 9:42:47 AM11/2/16
to sage-devel
Dear Jean-Pierre,


Le mercredi 2 novembre 2016 11:40:30 UTC+1, Jean-Pierre Flori a écrit :
I would say we even have a shorter-time solution: close the tickets for including curl (modulo adding a SAGE_FAT_BINARY mode to avoid overlinking)

?? What do you mean ?
 
and updating R (modulo adding a 6-line patch to not check for https support,

I'm not really comfortable with that, but, for now, it doesn't totally cripple R (some repositories are still http-reachable). Nevertheless, we still have to depend on libcurl but with no version check.
 
note that most of the time https will be supported anyway).

Okay. But that still implies accepting xz an pcre either as Sage dependencies or  as standard packages.

OTOH, depending on R and R library automatically ensures that the run-time dependencies are present...

Which ?

--
Emmanuel Charpentier

Reply all
Reply to author
Forward
0 new messages