renjin cran and gnuR packages

Per Nyfelt

unread,

Aug 6, 2020, 7:23:44 AM8/6/20

to Renjin

Hi,

Is there any way we could "open up" or "make accessible" the source to the packages that are on renjin cran?

Typically when someone tries out a package and finds some issue, it is reported as a bug in Renjin. While this may be true, it is not easy to figure out the cause and contribute a fix.

If the package code was mirrored in source format (including the pom file) then it would be easier to pinpoint the issue and create a fix where the fix could either be a patch to Renjin core or some workaround to make it work in Renjin. I understand that this would potentially mean Renjin specific forks for a lot of existing packages which would be hard to maintain but it could also mean a lot of improvements contributed to the gcc-bridge etc.

Maybe just enable a way to get a read-only copy of the code would be enough? Of course it is possible to download the source from CRAN or BioConductor, create a pom file and try to figure things out but I think that would just result in a fork with some java code to work around the issue that would be hard for others to find and not really help to advance Renjin.

What do you think?

/Per

Alexander Bertram

unread,

Aug 6, 2020, 8:30:01 AM8/6/20

to Renjin

Hi Per,

That's a very good question. The answer is probably "yes", but the "how" question is easier.

First: an easy pointer: on renjin.packages.org site, there is a link to a shell script that will check out and rebuild the pom + sources of any package previously built by the CI system. You can find this link on the build log page. For example:

curl http://packages.renjin.org/package/org.renjin.cran/ggplot2/3.2.0/build/8/rebuild.sh | sh

(Of course, reviewing the script before running isn't a bad idea either)

You can also use the command line tool to build packages _without_ native code using for example:

renjin build survey

Where survey is the directory with the original source code. No pom files required.

More broadly, the issue is that we are stuck between two systems.

The build system we built for Renjin 0.7 - 0.9 is a pretty cool powerhouse comprised of an Google AppEngine app and a Jenkins plugins that's capable of building tens of thousands of packages an hour with a high degree of concurrency. All of the build logs and tests results are stored in a Google Datastore project and provided stats on each Renjin "release" in terms of compatibility:

http://packages.renjin.org/qa/dashboard

Most of the builds are fully automated, with a small number of patched packages being pulled from GitHub repos based on name + version, for example:

https://github.com/bedatadriven/org.renjin.cran.wpp2015/tree/patched-1.0-1

This system was pretty neat, but it had serious problems.

1) It can only build and test packages after a Renjin release, so it served poorly as a quality gate: we'd make a fix, release, and then release it broke tests in a handful of random packages. Ideally, we would want to release only if there were no regressions.

2) Iterative development was rough. Between the CI system and the limitations of multi-module Maven builds, changing a line of code in Renjin and testing it against even a subset of packages was tedious and/or slow, as it requires a full maven rebuild of Renjin, and then either monkeying with Maven to rebuild and test packages locally, or doing a "release" and waiting for results from the build system.

3) There was a lot of noise in the results we did get from the CI system. There are many packages in CRAN and BioConductor that are simply never going to work with Renjin, and sometimes their tests would pass for random reasons, and then subsequently fail etc. Ideally, we would build only the packages that are well suited to running with Renjin.

With Renjin 3.5 the idea was to address these issues by making some big changes to the system.

The first idea was to migrate the build to Gradle, which is much much better at incremental, multi-module builds.

That opened the door to a better release process, especially by using Gradle composite builds, which we could do builds of Renjin together with a curated list of packages, and release the whole constellation together. That way, you know if you're using, e.g. Renjin 3.5.15, there's a set of packages that will work and work together. And it's also much easier to make and test quick changes to Renjin against a range of packages.

In the middle of all of this however, we made the decision as a company to concentrate in 2020 on our other product, ActivityInfo, which has taken off in way that requires significant focus. So right now we are in limbo between a (decomissioned) system based on Mavens+Jenkins and the (nearly finished) successor based on Gradle.

Opening up one or both of these systems might be a good solution. What I would suggest is maybe a call with you and anyone else that is interested next Friday? I can show you what we have, and we could look at ways to make it more accessible to a broader group of developers.

Friday, August 14th, 16h00 CET ?

https://meet.google.com/xms-swsj-yih

Best,

Alex

Bertram, Alexander

unread,

Aug 6, 2020, 10:52:17 AM8/6/20

to renji...@googlegroups.com

Sorry, meant to write:

That's a very good question. The answer is probably "yes", but the "how" question is harder.

:-)

--
You received this message because you are subscribed to the Google Groups "Renjin" group.
To unsubscribe from this group and stop receiving emails from it, send an email to renjin-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/renjin-dev/df63a04a-1e68-4d02-9943-5a8bea338cccn%40googlegroups.com.

--

Alexander Bertram

Technical Director
BeDataDriven BV

Web: http://bedatadriven.com
Email: al...@bedatadriven.com
Tel. Nederlands: +31(0)647205388
Skype: akbertram

Per Nyfelt

unread,

Aug 6, 2020, 10:57:03 AM8/6/20

to Renjin

Thanks for the explanation Alex! It sounds to me that the best thing would be to finish the new Gradle based stuff and take it from there. I'd love to help out! Friday next week at 16:00 is fine by me.