Hi,
here is another status report of the OpenRefine inclusion into Debian. We are
pretty much at the end of the road now. There are six source packages left that
have to be accepted by Debian's ftp team. I'm in contact with one of those
responsible because there were some concerns about copyright issues in Apache
Jena and the ODFDOM library. However I believe this has been sorted out and I
hope we can move forward now and clear the rest of the queue.
In the meantime I have backported the newly available OpenRefine dependencies
(22 source packages which I mentioned in my previous posts) to our current
stable release via bullseye-backports, a suite dedicated to bring new software
to an otherwise frozen release. Most of them have been accepted already and can
be downloaded when you add the respective entry to your /etc/apt/sources.list
file:
deb
http://deb.debian.org/debian/ bullseye-backports main
Docker image
============
I have created a Docker image of OpenRefine based on the official Debian docker
image. There are already plenty of OpenRefine docker images available based on
Alpine Linux and or OpenJDK 8. My motivation to create another one was to
identify possible dependency problems, e.g. I discovered that OpenRefine should
depend on procps in order to use the "free" command in the refine start script.
This could have gone unnoticed because procps is pulled in by other
dependencies on almost every standard desktop system.
I think it's also a nice way to experiment with the official Debian packaging
on non Linux systems by simply running Docker and inspecting the image. Here is
a quick tutorial to download the image and run the container.
# Install the image
1. docker pull apo1999/openrefine
# Run OpenRefine in the background and listen on port 3333
# OpenRefine should be accessible in your web browser now
2. docker run -d -p 3333:3333 --name openrefine-test apo1999/openrefine
# Inspect the container
3. docker exec -it openrefine-test /bin/bash
# Stop the container
4. docker stop openrefine-test
# Remove the container
5. docker rm openrefine-test
# Remove the image
6. docker rmi apo1999/openrefine
Warning: The image is quite large (1GB uncompressed, ~600 MB compressed) I
intend to experiment with the --no-recommends flag in the future, which should
reduce the number of packages but this image will always be larger than a mere
Alpine or OpenJDK image. Currently I find it useful for development purposes,
later the image will depend on the official packages in bullseye-backports.
Do you have more Docker ideas? Anything you would like to see implemented?
Creating a volume for persisting data is already on my todo list.
New package updates
===================
I have packaged new upstream releases of httpcomponents-client5,
httpcomponents-core5, openrefine-butterfly, apache-log4j2, google-http-client-
java, google-api-client-java. Because of two latter ones I could drop a patch
for google-api-services-sheets-java which is needed for the extensions. I also
updated OpenRefine itself to version 3.5.1.
I fixed Debian bug #1002274, a FTBFS (failed to build from source), in jetty9
which was caused by a missing or wrong dependency on the servlet API due to
changes in reverse-dependencies.
OpenRefine 3.5.1
================
I noticed the upgrade to log4j2 because of the prominent Log4Shell security
vulnerability. I suggest to move straight to version 2.17.1 because there were
some newly discovered CVE, fixed in 2.17.0 and 2.17.1, which are less severe
though.
I encountered a compilation problem in the server module because of the
dependency on log4j-slf4j-impl. It works for me if I change the artifact to
log4j-1.2-api in server/pom.xml now. Looking at Refine.java which makes use of
the org.apache.log4j.Level class, I'm not sure why you depend on log4j-slf4j-
impl instead because the Level class is in log4j-1.2-api or log4j-api.
To understand my problem, you need to know that we don't ship the *-impl
artifact because of CVE-2018-8088 in slf4j. Apache Log4j2 still depends on an
older version of slf4j to work around the removal of the EventData class which
makes it impossible for us to build the log4j-slf4j-impl artifact without
patching Apache Log4j2.
What's next
===========
I assume the packaging of OpenRefine and the backport to bullseye will be
completed in 3-6 weeks provided the ftp team's review process continues to be
positive. I report back as soon as I can.
Regards,
Markus