Re: Packaging OpenRefine for Debian

79 views
Skip to first unread message

Markus Koschany

unread,
May 24, 2021, 3:35:39 PM5/24/21
to openref...@googlegroups.com
Hi,

it took longer than initially expected but I have created the initial
OpenRefine Debian package and pushed it to

https://salsa.debian.org/java-team/openrefine

I intend to wait a little for the next official 3.5 release and upload it to
Debian afterwards. Here are some remarks and suggestions ordered by priority.

1. jquery.lavalamp.min.js is non-free
=====================================

Lavalamp is licensed under BY-NC 3.0 which makes it technically non-free and it
cannot be included in Debian main because of the non-commercial clause. If I
remove the file obviously some of the functionality of the main startpage is
removed. I believe it should be possible to work around it or replace the
plugin with a similar effect. In any case I had to remove the file from the
sources for now.

2. Minified Javascript files
============================

The webapp includes several minified Javascript files without providing the
original non-minified source files. This is acceptable and in accordance with
the license of those files but it is a reason to reject a package for Debian's
ftp team because Debian requires the original sources. I suggest to include the
corresponding non-minified Javascript files as well. Actually I believe
minified JS files don't improve the performance that much since OpenRefine is
run on the user's local system. You could remove them completely and use the
non-minified versions instead. There are some files like date.js which appear
to be minified Javascript as well despite the missing min.js suffix.

3. refine start script
======================

The script is able to execute some functionality which is handled differently
by Debian. For instance building the package or running the tests is distinct
from running the application. I believe the start script should only start
OpenRefine and maybe pass some configuration parameters. I would create a
specific start script for Debian only and maintain it if this is OK with you.
For reasons of consistency I will rename it to openrefine and install it in
/usr/bin/.

4. refine.ini
=============

I have installed refine.ini to /etc/openrefine/ as another way to configure
OpenRefine. I believe we should remove the
Google Data OAuth configuration for developers part since we will package only
released versions though.

5. Improving OpenRefine's visibility
========================================

I will forward a desktop and appdata file which improves the visibility in the
Linux ecosystem in general. It makes sense to create a manpage too. Could you
provide a 128x128 png icon of the OpenRefine logo? Even better would be a svg
file. You could extract this file from the icns file with icnsutils like that

icns2png -x -s 128x128 -d 32 openrefine.icns


or I can just send the png file to you.

6. Tracking bugs
================

I will create some tickets to ease the tracking of some open issues. For
example for jdatapath or the 16 year old version of commons-compress on which
openrefine-vicino currently depends on. This could easily get lost on the list.

7. Other work
=============

I have updated jsoup in Debian to the latest upstream release and uploaded it
to experimental to solve a compilation problem with OpenRefine. I still need to
check all the reverse-dependencies but I don't expect major problems.

I will also package the git-commit-plugin. It took a while to realize that the
git.properties file is essential to run the application and it will make the
packaging work a bit simpler.

8. The future
=============

It still may take some time for our ftp team to process OpenRefine and its
reverse-dependencies. I could provide a private repository for those who want
to install the Debian packages right now. Let me know if you are interested.

I believe we are on course and after the upload of 3.5 to Debian, I could take
a look at the extensions. Some of them require extra dependencies but there are
others like the Jython extension which should work in Debian right now.

Do you have other suggestions?

Markus



Antonin Delpeuch (lists)

unread,
Jun 3, 2021, 7:28:58 AM6/3/21
to openref...@googlegroups.com
Hi Markus,

Thank you very much for this update! This is very exciting. Replying inline.

On 24/05/2021 21:35, 'Markus Koschany' via OpenRefine Development wrote:
>
> 1. jquery.lavalamp.min.js is non-free
> =====================================
>
> Lavalamp is licensed under BY-NC 3.0 which makes it technically non-free and it
> cannot be included in Debian main because of the non-commercial clause. If I
> remove the file obviously some of the functionality of the main startpage is
> removed. I believe it should be possible to work around it or replace the
> plugin with a similar effect. In any case I had to remove the file from the
> sources for now.

I had the fear we would run into a problem like that. Thankfully it does
not sound as bad as the org.json dependency in the backend which was a
real pain to get rid of.

I opened an issue about it:
https://github.com/OpenRefine/OpenRefine/issues/3957

>
> 2. Minified Javascript files
> ============================
>
> The webapp includes several minified Javascript files without providing the
> original non-minified source files. This is acceptable and in accordance with
> the license of those files but it is a reason to reject a package for Debian's
> ftp team because Debian requires the original sources. I suggest to include the
> corresponding non-minified Javascript files as well. Actually I believe
> minified JS files don't improve the performance that much since OpenRefine is
> run on the user's local system. You could remove them completely and use the
> non-minified versions instead. There are some files like date.js which appear
> to be minified Javascript as well despite the missing min.js suffix.

Totally agree with you… I have added an issue about this here:
https://github.com/OpenRefine/OpenRefine/issues/3958

>
> 3. refine start script
> ======================
>
> The script is able to execute some functionality which is handled differently
> by Debian. For instance building the package or running the tests is distinct
> from running the application. I believe the start script should only start
> OpenRefine and maybe pass some configuration parameters. I would create a
> specific start script for Debian only and maintain it if this is OK with you.
> For reasons of consistency I will rename it to openrefine and install it in
> /usr/bin/.

That sounds very good to me. Generally speaking I would like to make
some progress on having a better launcher, which would add a tray icon
which could be used to close the program (instead of having to go
through the terminal).

https://github.com/OpenRefine/OpenRefine/issues/3221

>
> 4. refine.ini
> =============
>
> I have installed refine.ini to /etc/openrefine/ as another way to configure
> OpenRefine. I believe we should remove the
> Google Data OAuth configuration for developers part since we will package only
> released versions though.

Actually that's another interesting topic… At the moment we just add
these OAuth ids by hard-coding them in the application just before
packaging (without publishing that patch in git). I assume you cannot do
that in Debian, so if people want to use the Google integration they
would need a way to add these credentials in the .ini file, I think.

Anyway that is only going to become a problem once/if the Google
extension is packaged.

>
> 5. Improving OpenRefine's visibility
> ========================================
>
> I will forward a desktop and appdata file which improves the visibility in the
> Linux ecosystem in general. It makes sense to create a manpage too. Could you
> provide a 128x128 png icon of the OpenRefine logo? Even better would be a svg
> file. You could extract this file from the icns file with icnsutils like that
>
> icns2png -x -s 128x128 -d 32 openrefine.icns
>
>
> or I can just send the png file to you.

Do you mean it needs to be somewhere in the git repository? Of course we
can do that.

>
> 6. Tracking bugs
> ================
>
> I will create some tickets to ease the tracking of some open issues. For
> example for jdatapath or the 16 year old version of commons-compress on which
> openrefine-vicino currently depends on. This could easily get lost on the list.

For Jdatapath we have this issue:
https://github.com/OpenRefine/OpenRefine/issues/2961

For commons-compress it would make sense to open an issue indeed.

>
> 7. Other work
> =============
>
> I have updated jsoup in Debian to the latest upstream release and uploaded it
> to experimental to solve a compilation problem with OpenRefine. I still need to
> check all the reverse-dependencies but I don't expect major problems.
>
> I will also package the git-commit-plugin. It took a while to realize that the
> git.properties file is essential to run the application and it will make the
> packaging work a bit simpler.

I am not so happy with this plugin actually, it is quite slow. Maybe we
should just use some alternative if it also makes your work easier.

>
> 8. The future
> =============
>
> It still may take some time for our ftp team to process OpenRefine and its
> reverse-dependencies. I could provide a private repository for those who want
> to install the Debian packages right now. Let me know if you are interested.
>
> I believe we are on course and after the upload of 3.5 to Debian, I could take
> a look at the extensions. Some of them require extra dependencies but there are
> others like the Jython extension which should work in Debian right now.

Sounds exciting! Jython is definitely a very useful extension so if it's
within reach I would say that it's really worth the effort.

Antonin

Markus Koschany

unread,
Oct 20, 2021, 5:27:36 PM10/20/21
to openref...@googlegroups.com
Hi,

I wanted to give you a status report about the ongoing efforts to package
OpenRefine for Debian. Nine new dependencies have been accepted into Debian so
far but we still have to wait for fifteen more (including OpenRefine itself).
Unfortunately I don't have any control over the ftp team's review process but I
will keep you posted if there is any progress.

In the meantime I spend some time to make the OpenRefine extensions work.
Enabling the jython, pc-axis and phonetic extension was straightforward. For
the gdata, wikidata and database extension I have packaged the following new
dependencies for Debian.

1. Bug#996178: ITP: libwikidata-toolkit-java -- Wikidata Toolkit

https://bugs.debian.org/996178
https://tracker.debian.org/pkg/libwikidata-toolkit-java
https://salsa.debian.org/java-team/libwikidata-toolkit-java

2. Bug#996179: ITP: libokhttp-signpost-java -- Signpost extension for signing
OkHttp requests

https://bugs.debian.org/996179
https://tracker.debian.org/pkg/libokhttp-signpost-java
https://salsa.debian.org/java-team/libokhttp-signpost-java

3. Bug#996180: ITP: google-api-services-drive-java -- Google Drive API Client
Library for Java

https://bugs.debian.org/996180
https://tracker.debian.org/pkg/google-api-services-drive-java
https://salsa.debian.org/java-team/google-api-services-drive-java

4. Bug#996182: ITP: google-api-services-sheets-java -- Google Sheets API Client
Library for Java

https://bugs.debian.org/996182
https://tracker.debian.org/pkg/google-api-services-sheets-java
https://salsa.debian.org/java-team/google-api-services-sheets-java

5. Bug#996255: ITP: libowasp-encoder-java -- OWASP Java Encoder Project

https://bugs.debian.org/996255
https://tracker.debian.org/pkg/libowasp-encoder-java
https://salsa.debian.org/java-team/libowasp-encoder-java


In order to package owasp encoder I had to update libowasp-esapi-java.

https://tracker.debian.org/pkg/libowasp-esapi-java

libokhttp-signpost-java required an update of libokhttp-java to enable the
okhttp-tls and okhttp-urlconnection modules.

https://tracker.debian.org/pkg/libokhttp-java

The new Google APIs required a new upstream version of google-http-client-java

https://tracker.debian.org/pkg/google-http-client-java

Although I'm a member of the Java team I have filed two bug reports because the
Debian developers who have packaged google-http-client and google-api-client
need the packages for the Bazel build system and it appears we need to
coordinate future upgrades. One of the patches also appears wrong to me because
it requires reverse-dependencies like OpenRefine to define a dependency on
google-http-client-java which should not be necessary. In a nutshell this is a
Debian problem and we will solve it.

Bug#996693: google-http-client-java: please upgrade to version 1.40.1
https://bugs.debian.org/996693

Bug#996696: google-api-client-java: please drop add_depend.patch
https://bugs.debian.org/996696

jsoup, one of OpenRefine's dependencies, was affected by CVE-2021-37714 thus I
updated it to version 1.14.3.

https://tracker.debian.org/pkg/jsoup

httpcomponents-core5 and httpcomponents-client5 were also updated to the latest
upstream releases and I didn't spot any build problems with OpenRefine.

I have filed two bug reports related to an old commons-compress artifact in
vicino

https://github.com/OpenRefine/OpenRefine/issues/4228

and a feature request to add a desktop file, icon and appstream file to the
packaging directory.

https://github.com/OpenRefine/OpenRefine/issues/4229

Thanks for your fast reaction!

What's next:
============

I will start to backport the new dependencies of OpenRefine to Debian 11
"Bullseye" although not all packages have been accepted into the archive yet. I
hope this will happen until the end of the year.

For now I will add the missing sources of the Javascript files to the debian
packaging. I can forward them to you if you like.

https://github.com/OpenRefine/OpenRefine/issues/3958

By the way thanks for removing the non-free lavalamp plugin and replacing
jdatapath!

The extensions seem to work but more testing and user feedback is required. I
am currently looking into the org.xerial sqlite-jdbc artifact. We don't have a
sqlite jdbc driver in Debian yet but this one is problematic because they
simply embed sqlite into their jar file while we usually provide a separate jni
package. According to their documentation this approach has been abandoned.
This has security and maintenance implications for us and I'm not sure if we
should package it at all or if there is an alternative solution. However we
already provide jdbc drivers for mariadb and postgresql which are powerful
alternatives. Note: We don't ship the mysql driver anymore which has been
replaced by mariadb, so this one will not be packaged and I don't think it is
really necessary.

So far

Regards,

Markus




Antonin Delpeuch (lists)

unread,
Nov 19, 2021, 2:16:04 PM11/19/21
to openref...@googlegroups.com
Hi Markus,

Thank you so much again for this impressive work, and sorry I could not
find the time to reply earlier. It is amazing that you are packaging the
extensions too! I do think it is worth the effort indeed.

I have been working on eliminating our custom forks and snapshots of
dependencies and only rely on officially released versions, hoping that
this helps your efforts.

On 20/10/2021 23:27, 'Markus Koschany' via OpenRefine Development wrote:
> I have filed two bug reports related to an old commons-compress artifact in
> vicino
>
> https://github.com/OpenRefine/OpenRefine/issues/4228
>
> and a feature request to add a desktop file, icon and appstream file to the
> packaging directory.
>
> https://github.com/OpenRefine/OpenRefine/issues/4229

I will be working on those. In terms of release schedule, do let me know
if there is anything that we should release quickly, for instance by
publishing a 3.5.1 soon.

For our 3.6 version, I am not sure you saw that we are dropping Java 8
support because Apache Jena dropped it too and we need to upgrade to
their latest version to fix a vulnerability. I assume this will not be a
problem for Debian since Java 11 is widely available already?



> The extensions seem to work but more testing and user feedback is required. I
> am currently looking into the org.xerial sqlite-jdbc artifact. We don't have a
> sqlite jdbc driver in Debian yet but this one is problematic because they
> simply embed sqlite into their jar file while we usually provide a separate jni
> package. According to their documentation this approach has been abandoned.
> This has security and maintenance implications for us and I'm not sure if we
> should package it at all or if there is an alternative solution. However we
> already provide jdbc drivers for mariadb and postgresql which are powerful
> alternatives. Note: We don't ship the mysql driver anymore which has been
> replaced by mariadb, so this one will not be packaged and I don't think it is
> really necessary.

I never really understood why we need vendor-specific JDBC drivers to be
honest, so it does make sense to have only one MySQL/MariaDB at least.

For SQLite it does make sense to have something different of course, by
SQLite's nature. It makes sense that this is tricky to package. I also
cannot think of an easy solution there. It is totally worth packaging
the database extension without it (it has only been added recently).

Thanks again and do let me know if other useful upstream changes come to
mind!

Best,

Antonin

Thad Guidry

unread,
Nov 19, 2021, 2:41:12 PM11/19/21
to openref...@googlegroups.com
> The extensions seem to work but more testing and user feedback is required. I
> am currently looking into the org.xerial sqlite-jdbc artifact. We don't have a
> sqlite jdbc driver in Debian yet but this one is problematic because they
> simply embed sqlite into their jar file while we usually provide a separate jni
> package. According to their documentation this approach has been abandoned.
> This has security and maintenance implications for us and I'm not sure if we
> should package it at all or if there is an alternative solution. However we
> already provide jdbc drivers for mariadb and postgresql which are powerful
> alternatives. Note: We don't ship the mysql driver anymore which has been
> replaced by mariadb, so this one will not be packaged and I don't think it is
> really necessary.

I never really understood why we need vendor-specific JDBC drivers to be
honest, so it does make sense to have only one MySQL/MariaDB at least.

For SQLite it does make sense to have something different of course, by
SQLite's nature. It makes sense that this is tricky to package. I also
cannot think of an easy solution there. It is totally worth packaging
the database extension without it (it has only been added recently).

Thanks again and do let me know if other useful upstream changes come to
mind!

Best,

Antonin


I agree with Antonin.  We should not ever package vendor-specific JDBC drivers.  Let users handle that on their own, and we can provide guidance via improved documentation on our side.
I think it is fine to remove the packaging for the SQLite. It's pretty easy enough for users I think, given some good updated documentation on our side, to help them install it properly given their system of Linux, Mac, Windows.


 

Markus Koschany

unread,
Nov 21, 2021, 6:34:54 AM11/21/21
to openref...@googlegroups.com
Hi Antonin,

Am Freitag, dem 19.11.2021 um 20:16 +0100 schrieb Antonin Delpeuch (lists):
> Hi Markus,
>
> Thank you so much again for this impressive work, and sorry I could not
> find the time to reply earlier. It is amazing that you are packaging the
> extensions too! I do think it is worth the effort indeed.
>
> I have been working on eliminating our custom forks and snapshots of
> dependencies and only rely on officially released versions, hoping that
> this helps your efforts.

Relying on official versions is always a plus because it will reduce the
packaging work in the long-term. For now I believe we have pretty much every
dependency we need to run OpenRefine. We are currently down to seven packages
that are waiting in Debian's new queue for approval. I can't predict how long
it will take to process them hence I intend to create my own private Debian
repository for Debian 11 "Bullseye" now. This will also help in identifying
possible regressions due to the different library versions between our stable
and unstable distributions.

>
> On 20/10/2021 23:27, 'Markus Koschany' via OpenRefine Development wrote:
> > I have filed two bug reports related to an old commons-compress artifact in
> > vicino
> >
> > https://github.com/OpenRefine/OpenRefine/issues/4228
> >
> > and a feature request to add a desktop file, icon and appstream file to the
> > packaging directory.
> >
> > https://github.com/OpenRefine/OpenRefine/issues/4229
>
> I will be working on those. In terms of release schedule, do let me know
> if there is anything that we should release quickly, for instance by
> publishing a 3.5.1 soon.
>
> For our 3.6 version, I am not sure you saw that we are dropping Java 8
> support because Apache Jena dropped it too and we need to upgrade to
> their latest version to fix a vulnerability. I assume this will not be a
> problem for Debian since Java 11 is widely available already?

Java 11 is the default for Debian 10 and 11 already and we plan to switch to
OpenJDK 17 in our current release cycle. So dropping Java 8 is not a problem,
compatibility with Java 17 is more important to us.

Cheers,

Markus


Markus Koschany

unread,
Jan 10, 2022, 5:07:13 PM1/10/22
to openref...@googlegroups.com
Hi,

here is another status report of the OpenRefine inclusion into Debian. We are
pretty much at the end of the road now. There are six source packages left that
have to be accepted by Debian's ftp team. I'm in contact with one of those
responsible because there were some concerns about copyright issues in Apache
Jena and the ODFDOM library. However I believe this has been sorted out and I
hope we can move forward now and clear the rest of the queue.

In the meantime I have backported the newly available OpenRefine dependencies
(22 source packages which I mentioned in my previous posts) to our current
stable release via bullseye-backports, a suite dedicated to bring new software
to an otherwise frozen release. Most of them have been accepted already and can
be downloaded when you add the respective entry to your /etc/apt/sources.list
file:

deb http://deb.debian.org/debian/ bullseye-backports main



Docker image
============

I have created a Docker image of OpenRefine based on the official Debian docker
image. There are already plenty of OpenRefine docker images available based on
Alpine Linux and or OpenJDK 8. My motivation to create another one was to
identify possible dependency problems, e.g. I discovered that OpenRefine should
depend on procps in order to use the "free" command in the refine start script.
This could have gone unnoticed because procps is pulled in by other
dependencies on almost every standard desktop system.

I think it's also a nice way to experiment with the official Debian packaging
on non Linux systems by simply running Docker and inspecting the image. Here is
a quick tutorial to download the image and run the container.

# Install the image
1. docker pull apo1999/openrefine

# Run OpenRefine in the background and listen on port 3333
# OpenRefine should be accessible in your web browser now
2. docker run -d -p 3333:3333 --name openrefine-test apo1999/openrefine

# Inspect the container
3. docker exec -it openrefine-test /bin/bash

# Stop the container
4. docker stop openrefine-test

# Remove the container
5. docker rm openrefine-test

# Remove the image
6. docker rmi apo1999/openrefine

Warning: The image is quite large (1GB uncompressed, ~600 MB compressed) I
intend to experiment with the --no-recommends flag in the future, which should
reduce the number of packages but this image will always be larger than a mere
Alpine or OpenJDK image. Currently I find it useful for development purposes,
later the image will depend on the official packages in bullseye-backports.

Do you have more Docker ideas? Anything you would like to see implemented?
Creating a volume for persisting data is already on my todo list.

New package updates
===================

I have packaged new upstream releases of httpcomponents-client5,
httpcomponents-core5, openrefine-butterfly, apache-log4j2, google-http-client-
java, google-api-client-java. Because of two latter ones I could drop a patch
for google-api-services-sheets-java which is needed for the extensions. I also
updated OpenRefine itself to version 3.5.1.

I fixed Debian bug #1002274, a FTBFS (failed to build from source), in jetty9
which was caused by a missing or wrong dependency on the servlet API due to
changes in reverse-dependencies.

OpenRefine 3.5.1
================

I noticed the upgrade to log4j2 because of the prominent Log4Shell security
vulnerability. I suggest to move straight to version 2.17.1 because there were
some newly discovered CVE, fixed in 2.17.0 and 2.17.1, which are less severe
though.

I encountered a compilation problem in the server module because of the
dependency on log4j-slf4j-impl. It works for me if I change the artifact to
log4j-1.2-api in server/pom.xml now. Looking at Refine.java which makes use of
the org.apache.log4j.Level class, I'm not sure why you depend on log4j-slf4j-
impl instead because the Level class is in log4j-1.2-api or log4j-api.

To understand my problem, you need to know that we don't ship the *-impl
artifact because of CVE-2018-8088 in slf4j. Apache Log4j2 still depends on an
older version of slf4j to work around the removal of the EventData class which
makes it impossible for us to build the log4j-slf4j-impl artifact without
patching Apache Log4j2.

What's next
===========

I assume the packaging of OpenRefine and the backport to bullseye will be
completed in 3-6 weeks provided the ftp team's review process continues to be
positive. I report back as soon as I can.

Regards,

Markus


Thad Guidry

unread,
Jan 10, 2022, 10:46:45 PM1/10/22
to openref...@googlegroups.com
Thanks Markus!

I'm sure Antonin will comment on a few things.

From my perspective, the free mem test was something that probably could be taken out or made entirely optional in the Refine script.  We might do that later or now if it makes things easier.  But it is nice that you discovered the procps dependency.
I'm not a Debian user for Docker any longer, but your image and steps worked for me.  We have lots of options for Docker imaging and have talked briefly about it in the past in issue , but I think we decided we were not going to release an official Docker image that we were going to be responsible for and instead let the community continue with their experiments as you are doing.  However Antonin might have changed his mind on this.

1. One thing that I am curious about is the future of Debian and OpenRefine availability.  Will packaging for the next version of Debian be fairly trivial because of all the work you have done?  How will that work exactly?

2. Is bullseye-backports main the stable frozen release?  I'm a bit confused on what Debian does now in the last 6 years, I was a user prior to that.  Specifically, how or what does a Debian user of OpenRefine need to deal with backports.debian.org?

3. What about `free` context, I assume since you state the sources.list should have deb http://deb.debian.org/debian bullseye-backports main that OpenRefine will not be part of contrib and non-free sections?  In other words, a user will not see OpenRefine available when those sections are added, but will see OpenRefine available when they are excluded?  Trying to follow your notes along with the wiki https://wiki.debian.org/Backports
--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/openrefine-dev/e15e534b6a194bee436c1e442006667bfa83d51b.camel%40gmail.com.

Markus Koschany

unread,
Jan 11, 2022, 7:35:31 AM1/11/22
to openref...@googlegroups.com
> 1. One thing that I am curious about is the future of Debian and OpenRefine
> availability.  Will packaging for the next version of Debian be fairly
> trivial because of all the work you have done?  How will that work exactly?

Hi Thad,

Once the initial packaging work is done we enter the maintenance mode. If
nothing changes, no serious bugs are found and 3.5.1 was the latest upstream
release, then OpenRefine would be released as is and be part of the next Debian
12 "Bookworm" release which is scheduled for 2023. In the meantime Debian users
can already test it by installing Debian unstable or Debian testing which are
the bleeding edge Debian distributions for developers and power users.

In reality the following will happen in the future. Users will report bugs
against OpenRefine which can range from severity "whishlist" (please implement
feature X) to serious and grave (OpenRefine fails to build from source because
of some changes in reverse-dependency Y) I will forward everything upstream
related to OpenRefine's bug tracker and won't bother you for Debian related
problems.

Since others may start to use the new OpenRefine dependencies for their own
projects I need to double-check if a library upgrade would break their use
case. In general we try to solve the following optimization problem: Package
the latest release of every Java package in Debian, package only one version of
it and don't break existing packages / use cases. Naturally this is not a
simple task.

You should expect that we only need to package new dependencies if a)
OpenRefine starts to depend on new dependencies or b) an existing library
requires a new dependency. The latter happens from time to time but it is less
common than in the initial packaging phase.

Most time will be spent on replying to bug reports, fixing bugs in dependencies
and OpenRefine and packaging new upstream releases of libraries and OpenRefine
itself. For instance a release critical bug in Jetty like the one I have fixed
recently would prevent OpenRefine from being released as well because it
depends on Jetty. These kind of problems happen frequently.

> 2. Is bullseye-backports main the stable frozen release?  I'm a bit confused
> on what Debian does now in the last 6 years, I was a user prior to that. 
> Specifically, how or what does a Debian user of OpenRefine need to deal with
> backports.debian.org?

Just bullseye is the stable frozen release. bullseye-backports is an additional
suite which is intended for people who want to provide, support and use the
most recent upstream releases of a particular piece of software. It is not
possible to upload anything to stable directly apart from targeted point
updates which address single bugs or security vulnerabilities. OpenRefine users
who prefer Debian stable just have to add this single line to their apt
sources.list and they can install openrefine with

apt install -t bullseye-backports openrefine

The documentation on backports.debian.org is mostly for contributors who want
to upload their software packages to a backports suite. Normal users usually
don't need to bother with it.

>
> 3. What about `free` context, I assume since you state the sources.list
> should have deb http://deb.debian.org/debian bullseye-backports main that
> OpenRefine will not be part of contrib and non-free sections?  In other
> words, a user will not see OpenRefine available when those sections are
> added, but will see OpenRefine available when they are excluded?  Trying to
> follow your notes along with the wiki https://wiki.debian.org/Backports

Of course you can also add the contrib and non-free sections to

deb http://deb.debian.org/debian/ bullseye-backports main contrib non-free

That makes no difference but the point is you don't need software from contrib
and non-free because OpenRefine and its dependencies are entirely free
software, so you just can omit them. So you will be able to install OpenRefine
from bullseye-backports even if you need the non-free NVIDIA graphics drivers
as well. :)



>


Thad Guidry

unread,
Jan 11, 2022, 8:40:03 AM1/11/22
to openref...@googlegroups.com
Thanks Markus for the perfect responses.

Antonin Delpeuch (lists)

unread,
Jan 25, 2022, 3:23:47 AM1/25/22
to openref...@googlegroups.com
Hi Markus,

Thank you very much for this progress report! Replying inline.

On 10/01/2022 23:07, 'Markus Koschany' via OpenRefine Development wrote:
>
> Docker image
> ============
>
> I have created a Docker image of OpenRefine based on the official Debian docker
> image. There are already plenty of OpenRefine docker images available based on
> Alpine Linux and or OpenJDK 8. My motivation to create another one was to
> identify possible dependency problems, e.g. I discovered that OpenRefine should
> depend on procps in order to use the "free" command in the refine start script.
> This could have gone unnoticed because procps is pulled in by other
> dependencies on almost every standard desktop system.

Thanks for pointing that out!

This dependency is used to display the available free memory at startup,
but since the corresponding message is only displayed in the log, it is
not very visible. Ideally this sort of info should be displayed in the
web UI, so we could migrate this check to the backend, using a Java
library such as https://github.com/oshi/oshi. Would that make sense?

There is quite some interest around running OpenRefine via Docker so I
feel like it is worth trying to iron out this sort of thing.

>
> Do you have more Docker ideas? Anything you would like to see implemented?
> Creating a volume for persisting data is already on my todo list.

I wonder if it would make sense to host this Docker configuration
somewhere that looks official in the project (either main repository, or
other repository in our GitHub organization).

> OpenRefine 3.5.1
> ================
>
> I noticed the upgrade to log4j2 because of the prominent Log4Shell security
> vulnerability. I suggest to move straight to version 2.17.1 because there were
> some newly discovered CVE, fixed in 2.17.0 and 2.17.1, which are less severe
> though.
>
> I encountered a compilation problem in the server module because of the
> dependency on log4j-slf4j-impl. It works for me if I change the artifact to
> log4j-1.2-api in server/pom.xml now. Looking at Refine.java which makes use of
> the org.apache.log4j.Level class, I'm not sure why you depend on log4j-slf4j-
> impl instead because the Level class is in log4j-1.2-api or log4j-api.

I recently made more changes to this, so I hope the situation is cleaner
now, but we still currently depend on log4j-slf4j-impl. If I remove the
dependency entirely, OpenRefine fails to start properly (no logging
implementation can be found), but you are right that this does not need
to be a compile dependency (just a runtime one). Or should it not be a
dependency at all?

Best,
Antonin

Markus Koschany

unread,
Feb 8, 2022, 4:06:54 PM2/8/22
to openref...@googlegroups.com
Am Dienstag, dem 25.01.2022 um 09:23 +0100 schrieb Antonin Delpeuch (lists):

>
> I wonder if it would make sense to host this Docker configuration
> somewhere that looks official in the project (either main repository, or
> other repository in our GitHub organization).

The Dockerfile is quite simple. My intention was to install it as an example
into /usr/share/doc/openrefine/examples. It is currently stored here

https://salsa.debian.org/java-team/openrefine/-/blob/master/debian/examples/Dockerfile

and installs all binary packages from a private directory but later it will
just download Openrefine from bullseye-backports.

Sure, if you find it useful you could add it next to the appdata and desktop
files, possibly some misc directory.

[...]
>
> I recently made more changes to this, so I hope the situation is cleaner
> now, but we still currently depend on log4j-slf4j-impl. If I remove the
> dependency entirely, OpenRefine fails to start properly (no logging
> implementation can be found), but you are right that this does not need
> to be a compile dependency (just a runtime one). Or should it not be a
> dependency at all?

I need to take a closer look again but at the moment it isn't a big problem.

Cheers,

Markus


Markus Koschany

unread,
Mar 2, 2022, 6:56:17 PM3/2/22
to openref...@googlegroups.com
Hello,

it is done. OpenRefine is now part of Debian.

https://tracker.debian.org/pkg/openrefine


A few days ago version 3.5.2 entered Debian testing. It will also be part of
the upcoming Ubuntu 22.04 release. If you use unstable or testing you can just
install it the usual way with:

apt install openrefine

I noticed that the non-free lavalamp.js file was still present in the 3.5.2
release tarball from Github. I guess the fix for

https://github.com/OpenRefine/OpenRefine/issues/3957

was not merged into the 3.x branch? Anyway, I have just applied the fix
manually and removed the file from the sources.

The minified Javascript files are still part of the sources.

https://github.com/OpenRefine/OpenRefine/issues/3958

I took a closer look and added the missing source files to debian/missing-
sources:

https://salsa.debian.org/java-team/openrefine/-/tree/master/debian/missing-sources

I believe it should be fine to ship those files next to the minified ones, so
that users could make modifications if desired. Probably you could also just
replace the minified files. I don't believe there is any noticeable performance
decrease/lag since OpenRefine runs on your local computer anyway. Most of the
Javascript files were quite old. For instance the minified datajs version is
from 2007, the last official release. There have been newer versions which
fixed several bugs, so I took the liberty to use a (now also abandoned) fork
instead but I am not sure if this would be really a sufficient replacement

https://github.com/abritinthebay/datejs


Debian backports
================

OpenRefine will be accepted into bullseye-backports shortly (it should be only
a matter of days now), so you can run Debian stable and install it like that:

1. Edit /etc/apt/sources.list and add the following lines

deb http://deb.debian.org/debian bullseye-backports main
deb-src http://deb.debian.org/debian bullseye-backports main

then install openrefine with

apt install -t bullseye-backports openrefine


Docker
======

Once openrefine enters bullseye-backports I will update the Dockerfile and then
you can pull it from Docker hub with

docker pull apo1999/openrefine

again.

Misc
====

- I packaged new upstream version of Jetty 9, Marc4j and httpcomponents-
client5.

- I fixed a dependency issue in apache-jena. Previously I used a workaround to
build the package from source which is no longer required now.

- Another Debian contributor packaged xerial-sqlite-jdbc which means that we
fully support all extensions now and all major databases (SQLite, MariaDB,
PostgreSQL)

- I have created a Debian wiki page to give people a brief overview of
OpenRefine.

https://wiki.debian.org/Java/OpenRefine

I will add some additional information in the future and some pictures.

So far

Best,

Markus

Thad Guidry

unread,
Mar 2, 2022, 9:47:30 PM3/2/22
to openref...@googlegroups.com
Thanks so much Markus for the update!

So 2 questions which I might have missed from you answering already over the last 6 months so I apologize...

1. why the curl dependency?
2. why the Tomcat dependency?

Markus Koschany

unread,
Mar 3, 2022, 5:35:58 AM3/3/22
to openref...@googlegroups.com
Hi Thad,

Am Mittwoch, dem 02.03.2022 um 20:47 -0600 schrieb Thad Guidry:
> Thanks so much Markus for the update!
>
> So 2 questions which I might have missed from you answering already over the
> last 6 months so I apologize...
>
> Looking at https://packages.debian.org/bookworm/openrefine
>
> 1. why the curl dependency?
> 2. why the Tomcat dependency?

curl is required by the refine script. I have only made some small Debian
specific changes to the script but otherwise it is pristine. [1] I'm not sure
if I should trim it down to a minimal version or if I should keep certain
aspects of it. For a minimal version we probably don't need curl anymore.

I added the libtomcat9-java dependency because I thought I needed the
javax.annotation classes which are in tomcat9-annotations-api.jar. libtomcat9-
java provides core libraries but not the Tomcat server itself. javax.annotation
is distributed under different group names in Maven and we decided to use only
the Tomcat version at one point. It seems it is actually not required by
openrefine and I just removed it.

Markus

[1]
https://salsa.debian.org/java-team/openrefine/-/blob/master/debian/patches/openrefine-bin.patch

Markus Koschany

unread,
May 24, 2022, 7:42:53 AM5/24/22
to openref...@googlegroups.com
Hi folks,

I just wanted to give you a short summary what happened in the past two months
regarding OpenRefine in Debian.

So far I have not received any complaints or bug reports which is a good thing
but it should not be overrated because openrefine is still quite new and has
not been exposed to a stable release which most Debian users prefer. Other
Debian based distributions like Ubuntu or Mint have already copied the package
and released it, e.g.

https://launchpad.net/ubuntu/+source/openrefine


I spent some time updating various dependencies and looked into the required
steps to package newer releases of apache-jena and odfdom.

* libowasp-esapi-java [1] (required for libowasp-encoder-java->openrefine)
needed a new upstream release because of the reported security vulnerabilities
CVE-2022-24891 and CVE-2022-23457.

* I also updated apache-log4j2, libthrift-java and jsoup and had to fix two
build failures in hibernate-validator{4,5} because of that.

I looked into apache-jena, the latest version is 4.5.0 now. There have been
some significant changes and if you intend to upgrade from the current version
3.17.0 we will either need new artifacts/packages or patches, e.g.
com.apicatalog:titanium-json-ld and org.graalvm.js:js. I have not upgraded
apache-jena because of these major changes yet.

I investigated issue #144 [2] in odfdom which prevents further upgrades.
Currently the project depends on the non-free artifact org.json:json but
upstream is willing to change that, although it's not a priority for them,
hence why they prefer some help. I believe replacing org.json:json with
com.vaadin.external.google:android-json or using com.github.openjson:openjson
is possible because both projects are almost drop-in-replacements. I intend to
give it another try next month.

Last but not least I uploaded a new revision of openrefine to Debian unstable
to update some build dependencies which we discussed two months ago. The
tomcat9 library dependency was not really necessary.

Regards,

Markus

[1] https://tracker.debian.org/pkg/libowasp-esapi-java
[2]https://github.com/tdf/odftoolkit/issues/144

Thad Guidry

unread,
May 24, 2022, 9:46:18 AM5/24/22
to openref...@googlegroups.com
Thanks Markus so much for continuing to assist us and move this forward incrementally.

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.
--

Antonin Delpeuch (lists)

unread,
Jun 26, 2022, 11:28:47 AM6/26/22
to openref...@googlegroups.com
Hi Markus,

Thanks a lot for this update and sorry for this very slow reply.

On 24/05/2022 13:42, 'Markus Koschany' via OpenRefine Development wrote:
> I looked into apache-jena, the latest version is 4.5.0 now. There have been
> some significant changes and if you intend to upgrade from the current version
> 3.17.0 we will either need new artifacts/packages or patches, e.g.
> com.apicatalog:titanium-json-ld and org.graalvm.js:js. I have not upgraded
> apache-jena because of these major changes yet.

We did update to Jena 4 some time ago already, this will be part of our
upcoming 3.6 release.

> I investigated issue #144 [2] in odfdom which prevents further upgrades.
> Currently the project depends on the non-free artifact org.json:json but
> upstream is willing to change that, although it's not a priority for them,
> hence why they prefer some help. I believe replacing org.json:json with
> com.vaadin.external.google:android-json or using com.github.openjson:openjson
> is possible because both projects are almost drop-in-replacements. I intend to
> give it another try next month.

Overall I am worried about this dependency. There all sorts of issues
and I feel like there is always a risk to discover new skeletons in the
closet.

Looking at https://opendocumentformat.org/developers/ I do not see any
viable open source alternative to it, though.

It is of course fabulous if nudging this library in the right direction
can be part of your packaging efforts.

We will follow-up to discuss the renewal of your contract separately.

Thank you so much again for all your work,

Antonin

Markus Koschany

unread,
Aug 25, 2022, 5:36:32 PM8/25/22
to openref...@googlegroups.com
Hi all,

here is another status update of OpenRefine in Debian. I have been mainly
focusing on packaging version 3.6.1 of OpenRefine and its dependencies.

apache-jena:

4.5.0 introduced a new dependency on titanium-json-ld which I packaged as
libtitanium-json-ld-java.

https://salsa.debian.org/java-team/libtitanium-json-ld-java

https://bugs.debian.org/1017644


which in turn required Jakarta JSON packaged as libjsonp2-java

https://salsa.debian.org/java-team/libjsonp2-java

https://bugs.debian.org/1017642

The third new package was language-detector packaged as liblanguage-detector-
java

https://salsa.debian.org/java-team/liblanguage-detector-java

https://bugs.debian.org/1018100


I also updated libwikidata-toolkit-java to version 0.13.3

https://salsa.debian.org/java-team/libwikidata-toolkit-java

and jsoup (1.15.2), httpcomponents-core5 (5.1.4) and jetty9 (9.4.48), the
latter mainly to fix two security vulnerabilities.

As usual it will take a few days/weeks to process the new packages by our ftp-
team. I get back to you as soon as that happens.

Replacing the non-free json library in odfdom is still on my todo list.

Have a nice weekend

Markus



Thad Guidry

unread,
Aug 25, 2022, 6:03:03 PM8/25/22
to openref...@googlegroups.com
Thanks for the update Markus,

Any idea when completely free odfdom will be ready?  1 month, 2 ?

--
You received this message because you are subscribed to the Google Groups "OpenRefine Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrefine-de...@googlegroups.com.

Markus Koschany

unread,
Aug 25, 2022, 6:34:35 PM8/25/22
to openref...@googlegroups.com
>
> Any idea when completely free odfdom will be ready?  1 month, 2 ?

I guess more like 2 because I'm on holiday next week

Reply all
Reply to author
Forward
0 new messages