Python in a Docker Container

2,257 views
Skip to first unread message

Vance Briggs

unread,
Oct 12, 2021, 3:31:39 AM10/12/21
to Reading Hackspace
Any Python/Docker experts?

I am trying to "Dockerize" a python project, but am failing...

I can get the project installed and running on my (linux) machine in a python3 virtual env.  I have written a Dockerfile to install the app in a docker container, but it fails when it gets to using pip to install the requirements:

...
RUN pip install -r requirements.txt
...

It fails, building wheels for some of the packages, which means the whole container build fails.  When I run the same install command in a python3 venv on my machine the install of requirements doesn't fail, in fact I am not even sure that it builds the wheel at all, it may just manage to download a binary wheel from somewhere.

I don't understand why this should work in a venv on my machine, but won't install into a docker container - surely pip has access to the same packages in both environments?

Dockerfile and requirements.txt attached

Any suggestions for troubleshooting this welcome.

Thanks

Vance
requirements.txt
Dockerfile

Jeremy Poulter

unread,
Oct 12, 2021, 3:35:26 AM10/12/21
to rLab List
Hi,

What is the error? If it is trying to build something, you may need to also install `build_essentials`

--
You received this message because you are subscribed to the Google Groups "rLab / Reading's Hackspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reading-hacksp...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/reading-hackspace/CAFLXuqvMpoFZOFuaSePKB2nmeYO93KKHqsAm%2BbDkL2%3DSPBXTsA%40mail.gmail.com.

Vance Briggs

unread,
Oct 12, 2021, 3:43:48 AM10/12/21
to Reading Hackspace
It is failing to build a few of the dependencies for pyproj.  In my troubleshooting I had installed build_essential, but have removed it again as it didn't solve the issue, but that could have been another problem.  I'll add it back in to the Dockerfile and retry.

Here is the first error:

  Building wheel for pyproj (pyproject.toml): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python /usr/local/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /tmp/tmpzll1cxor
       cwd: /tmp/pip-install-7nlmrqp0/pyproj_1eef29502d43471aa48af26b5bd71e97
  Complete output (47 lines):
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build/lib.linux-x86_64-3.10
  creating build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/__main__.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/proj.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/__init__.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/transformer.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/network.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/geod.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/datadir.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/sync.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/_show_versions.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/aoi.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/utils.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/enums.py -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/exceptions.py -> build/lib.linux-x86_64-3.10/pyproj
  creating build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/crs/coordinate_operation.py -> build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/crs/__init__.py -> build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/crs/crs.py -> build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/crs/coordinate_system.py -> build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/crs/datum.py -> build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/crs/enums.py -> build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/crs/_cf1x8.py -> build/lib.linux-x86_64-3.10/pyproj/crs
  copying pyproj/_sync.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/database.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/_compat.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/list.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/_crs.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/_network.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/_datadir.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/_transformer.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/_geod.pyi -> build/lib.linux-x86_64-3.10/pyproj
  copying pyproj/py.typed -> build/lib.linux-x86_64-3.10/pyproj
  running build_ext
  building 'pyproj._geod' extension
  creating build/temp.linux-x86_64-3.10
  creating build/temp.linux-x86_64-3.10/pyproj
  gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/include -I/usr/local/include/python3.10 -c pyproj/_geod.c -o build/temp.linux-x86_64-3.10/pyproj/_geod.o
  pyproj/_geod.c:645:10: fatal error: geodesic.h: No such file or directory
    645 | #include "geodesic.h"
        |          ^~~~~~~~~~~~
  compilation terminated.
  error: command '/usr/bin/gcc' failed with exit code 1
  ----------------------------------------
  ERROR: Failed building wheel for pyproj

And the second one

Building wheel for PyWavelets (setup.py): finished with status 'error'
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py'"'"'; __file__='"'"'/tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-q1_z8wm8
       cwd: /tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/
  Complete output (11 lines):
  /tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py:62: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
    import imp
  Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py", line 477, in <module>
      setup_package()
    File "/tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py", line 467, in setup_package
      ext_modules = get_ext_modules(USE_CYTHON)
    File "/tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py", line 182, in get_ext_modules
      from numpy import get_include as get_numpy_include
  ModuleNotFoundError: No module named 'numpy'
  ----------------------------------------
  ERROR: Failed building wheel for PyWavelets
  Running setup.py clean for PyWavelets
  ERROR: Command errored out with exit status 1:
   command: /usr/local/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py'"'"'; __file__='"'"'/tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
       cwd: /tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a
  Complete output (11 lines):
  /tmp/pip-install-7nlmrqp0/pywavelets_705568b45864417ea5045330e2a5cb8a/setup.py:62: DeprecationWarning: the imp module is deprecated in favour of importlib and slated for removal in Python 3.12; see the module's documentation for alternative uses
    import imp
 
  `setup.py clean` is not supported, use one of the following instead:
 
    - `git clean -xdf` (cleans all files)
    - `git clean -Xdf` (cleans all versioned files, doesn't touch
                        files that aren't checked into the git repo)
 
  Add `--force` to your command to use it anyway if you must (unsupported).
 
  ----------------------------------------
  ERROR: Failed cleaning build dir for PyWavelets

Hugo Mills

unread,
Oct 12, 2021, 3:45:27 AM10/12/21
to reading-...@googlegroups.com
I think Jeremy's probably put his finger on it already, but a
couple of more generally useful points here:

One useful point (which I think you've already dealt with): don't
use alpine as a base -- it uses a different libc to most
distributions, which means that precompiled wheels won't work. This
means that you have to build a lot more of the python dependencies
from scratch, which makes your build time go up hugely, and the size
of the image goes up a lot too. Using a Debian base is generally
better here.

Second, you can investigate the problems directly in more detail by
commenting out the failing lines and setting your CMD to something
that does nothing for a long time (like "sleep 86400"). Then you can
build and start a partial container and connect to it with

$ docker exec -it <container> bash

At this point, you can then run the problematic command(s) in the
same environment that the build process is doing them, and see where
the problem is.

Hugo.
--
Hugo Mills | Great films about cricket: Umpire of the Rising Sun
hugo@... carfax.org.uk |
http://carfax.org.uk/ |
PGP: E2AB1DE4 |

Mark Robson

unread,
Oct 12, 2021, 4:09:47 AM10/12/21
to reading-...@googlegroups.com
Hi Vance,

I would agree with Hugo- don't use Alpine Linux as a base distro - just use Ubuntu for minimum fuss (the image will be bigger, but just tolerate it). I usually use Ubuntu 20.04 which is the LTS (long term support) version and so is likely to receive updates for longer than random other releases.

It looks like the problem might be that the build-time dependencies of some of the modules aren't satisfied. Usually you can get those by installing the relevant -dev package of whatever module fails.

I use the package "build-essential" on Ubuntu which installs a bunch of stuff that's usually required to build anything else.

But generally I find Docker a pain in the arse and it solves a nonexistent problem (I'm not even sure what that problem is) that is probably already solved by just using a VM with a known software stack.

--

Hugo Mills

unread,
Oct 12, 2021, 4:25:41 AM10/12/21
to reading-...@googlegroups.com
On Tue, Oct 12, 2021 at 08:43:35AM +0100, Vance Briggs wrote:
> It is failing to build a few of the dependencies for pyproj. In my
> troubleshooting I had installed build_essential, but have removed it again
> as it didn't solve the issue, but that could have been another problem.
> I'll add it back in to the Dockerfile and retry.

build-essential is... well... essential. It pulls in little things
like the C compiler and the libc development files. I'd be surprised
if anything worked without it.

> Here is the first error:
[snip]
> pyproj/_geod.c:645:10: fatal error: geodesic.h: No such file or directory
> 645 | #include "geodesic.h"
> | ^~~~~~~~~~~~
> compilation terminated.
> error: command '/usr/bin/gcc' failed with exit code 1
> ----------------------------------------
> ERROR: Failed building wheel for pyproj

$ apt-file search geodesic.h
geographiclib-doc: /usr/share/doc/geographiclib/html/geodesic.html
grass-doc: /usr/share/doc/grass-doc/html/d.geodesic.html
libboost1.74-dev: /usr/include/boost/graph/detail/geodesic.hpp
libproj-dev: /usr/include/geodesic.h
sagemath-doc: /usr/share/doc/sagemath/html/en/reference/hyperbolic_geometry/sage/geometry/hyperbolic_space/hyperbolic_geodesic.html

It looks like you need "libproj-dev" for that one. You will
probably need to do this repeatedly until you've found all the
dependencies (or, if you're lucky, there will be some documentation
for this package online that says what the build dependencies are on
Debian/Ubuntu).
[snip]

This one appears to be numpy that's missing, which is odd as I'd
expect pip to be able to find that one. Maybe it wasn't declared in
the package's dependencies for some reason. It should be installable
with pip or apt.

Hugo.
Hugo Mills | Go not to the elves for counsel, for they will say
hugo@... carfax.org.uk | both no and yes.
http://carfax.org.uk/ |
PGP: E2AB1DE4 |

Vance Briggs

unread,
Oct 12, 2021, 7:50:24 AM10/12/21
to Reading Hackspace
Hugo, Jeremy, Mark,

Thanks for your help.  The problem I am trying to solve with docker is for me to take this pain of building it once and then others can get a working system more simply.  My desktop machine is running Ubuntu, but I expected any python version using pip to be able to pull the same packages - I am obviously wrong (Edit - watch different python versions.  Latest versions may not have binary wheels available for all packages)

TL;DR - Reverted to 3.9-bullseye as the base container, all python packages installed as binary wheels, everything runs!

Troubleshooting steps:

I added numpy explicitly to requirements.txt (This would have happened with `pip freeze`, but I used pip-chill to reduce to the minimal requirements and let pip install the dependencies - didn't work :-).  I also added libproj-dev using apt.

This fixes the first error, but the second - missing numpy is still an issue.  Did a proper `pip freeze` to get a requirements.txt with version numbers from the running venv instead of using pip-chill.  This also didn't build as it couldn't satisfy some of the package versions, namely scipy==1.7.1, and pkg_resources==0.0.0.  Changed scipy to 1.6.1 and removed pkg_resources from the requirements.txt thinking that it could be installed as a dependency if needed. Even after these manual tweaks to requirements.txt it still fails to build missing numpy:

Looks like the build fails because it needs Cython installed - not obvious from the error, but now builds.  Although this built it failed to run as there were conflicts between Cython 0.29.24 and python 3.10 - reverted to python 3.9, all packages installed from binary wheels and everything worked!

Thanks again

Vance

--
You received this message because you are subscribed to the Google Groups "rLab / Reading's Hackspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reading-hacksp...@googlegroups.com.

Matt Bruce

unread,
Oct 12, 2021, 10:24:22 AM10/12/21
to rLab / Reading's Hackspace
This thread has been very useful, as I'm looking at creating some Docker containers in the near future. I now have a much better idea of how to approach it and what to watch out for.

I'm grateful that you pulled out your hair to save me having to pull out mine. :)

-Matt

Vance Briggs

unread,
Oct 12, 2021, 11:04:42 AM10/12/21
to Reading Hackspace
Matt,

Glad it helped, but I have further to go and am now thinking through the "next problem", that of the correct architecture of the final app...

For background - I have had this app running on a server and am now trying to "dockerize" it...

The app has a very simple flask UI with a single web page and a couple of "html api" calls.  The processing takes a lot of time so waiting for an HTMLresponse is not really best practice.  I have used redis queues to manage incoming requests (jobs) and processing these jobs by workers asynchronously.  The UI can then poll the front end (which in turn asks redis) to find out when its job is finished...

The problem I have is now what's the best docker architecture to support this.  From the docker documentation it is clear - I need to use multiple docker containers (https://docs.docker.com/config/containers/multi-service_container/).  So breaking this down it is obvious that I should have a separate redis container, but then should I stand up one or two more containers - UI & Worker, or a single combined container for both.  The thing that is pushing me towards a single container for these is that the UI places a job on the redis task queue, but to do this it needs to import the worker function from the Worker backend code.  So they are either both in a single container which gives easy access or the UI container also has to have the worker code so that it can be imported and put on the queue.

The redis worker process that pulls jobs off the queue has been running as a daemon in my non-dockerized implementation (rq worker), but I don't think that sufficient information is passed over the queue for the job to run - there are other python files that are imported by the job, so I think that the worker (rq) needs access to the other files, but I am unclear how this happens...

I haven't really got my head around this yet.  Outstanding questions in my mind are:
  • What is actually placed on the redis task queue?
    • A reference to the python job function?
    • How would this correlate across containers?
  • How does rq use the job information from the queue and how does it find the dependencies?
    • In my rq worker service script I have the working directory set to the base directory of the project, so maybe it uses this as a reference point?
Anyway, more fun to come :-)

Vance

--
You received this message because you are subscribed to the Google Groups "rLab / Reading's Hackspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reading-hacksp...@googlegroups.com.

Jeremy Poulter

unread,
Oct 12, 2021, 1:10:08 PM10/12/21
to rLab List
Typically you want to get each process in a separate container, so from what you have described you are probably looking at Redis, your python app, and Apache/Nginx/other web server of your choice to serve the app and act as a reverse proxy for your app. If there is a database or data store that would be a separate container as well.

Jeremy

Mark Robson

unread,
Oct 12, 2021, 1:15:33 PM10/12/21
to reading-...@googlegroups.com
To give my quite un-dockerishly-correct opinion:

It is going to be *much* easier to run the different parts of your service in the same container. If your service does not need to store persistent data, and it's just a queue, you can use a much simpler implementation which doesn't need so many third party components and is more likely to work.

While the "One True Docker Way" is to run every single little tiddly thing in its own container, there is nothing really stopping you from running several things in one.

You can use a non-persistent database to store any intermediate data (sqlite is a popular choice, make sure your app creates all its tables at start-time then you can do whatever you want). You can use the non-persistent Docker filesystem which will get wiped whenever the container is restarted (if Docker feels like it, it depends on the container etc). All filesystems in Docker are non-persistent unless you've told them to be persistent (and in my experience, they are sometimes non-persistent even then)

So multiple processes on the same Docker container can just share stuff as they would normally on the same machine, including starting subprocesses, writing files, checking file contents. Provided you can do that without creating a race condition (e.g. TOCTTOU races) then that would be sufficient.

Personally I'd probably use a sqlite DB or write the job status in files, then the polling process can check the database or file to see if it's finished.

---

The problem I always have with Docker is that there is no standard way to run periodic jobs etc, you could run systemd (but systemd has its opponents and does lots of things which aren't useful in a Docker), you could run cron, but cron is a pain to debug without syslog.

I've been here before and I ended up with a container running cron, syslog and a web server. Syslog was only because cron seems to send its output to nowhere without (not even the Docker console log) - this includes the diagnostic messages explaing why it won't run jobs.

Mark


Jeremy Poulter

unread,
Oct 12, 2021, 2:02:27 PM10/12/21
to rLab List
Hi,

The problem with that is Linux Containers (on which Docker is built) are not really designed to be used to run multiple processes. They are a mechanism for process isolation, for making sure a process only has access to what it is supposed to have access to.

So while you can run multiple processes in a single container it may cause more problems than it solves.

This is also the reason Docker doesn't have Cron like functions built in, because the intention is you wound have a container with a scheduler of your choice, then a container for each job. While this may seem like overkill remember each container only has the differences between your base image and the job. 

Think of it as object orientation for Linux processes. Just as using C and putting all you code in a single function may initially be easy, as your project evolves it becomes harder and harder to maintain. 

Jeremy

--
You received this message because you are subscribed to the Google Groups "rLab / Reading's Hackspace" group.
To unsubscribe from this group and stop receiving emails from it, send an email to reading-hacksp...@googlegroups.com.

Mark Robson

unread,
Oct 12, 2021, 3:18:36 PM10/12/21
to reading-...@googlegroups.com
Yeah, whatever,

Well I didn't put all my code in one function!

Basically I had some code which I wanted to run *both* as a "cron" job and as a web server back end process. This would do essentially exactly the same thing on either a scheduled basis or interactively on demand.

I'm sure there is a docker-ishly correct way to do that but I don't know exactly what it would be (probably some bollocks with microservices!). I could have some external scheduler spin up a container every time the scheduled jobs are required (which is quite often) but then that container would require heaps of libraries that are exactly the same as the web server container (I used a separate container for my persistent database)

So trying to avoid masses of repeated code (basically the same container? with different startup parameters?) I just ran them all in the same container. This is probably not the best way to do it but it worked for me.

The biggest problem I have with Docker is that it is almost, but not quite like a "real" VM. 99% of things will work exactly the same in Docker as not in docker, but then the 1% that don't, will suck up (developer) time to diagnose the fault.

I also like to be able to log into the machine in a remote shell (you can generally *sort of* do this with Docker, depending on the container infrastructure, but maybe the command line history gets erased, or maybe the terminal doesn't quite behave how you expect, or 100 other tiny little annoyances) and then run tools such as "ls", "top", "strace", examine log files, all of which Docker containers seem to make more difficult - usually just by having those tools not exist in the container.

I particularly dislike the idea of "let's only have one binary in this Docker" then you can't even open a shell, much less actually run useful commands to see what's happening.

Of course it might be possible to run some of those commands on the Docker container host machine, but maybe you're running all of this in a "cloudy" infrastructure thing where the host machine isn't even visible?

Maybe I've just been bitten by a particularly poor setup that I have been forced to use by the corporate machine.

Mark


Vance Briggs

unread,
Oct 13, 2021, 3:33:28 AM10/13/21
to Reading Hackspace
Hi guys,

Sorry, I didn't mean to start a docker - anti-docker war.  I think that there are strong arguments on both sides, and if you bend docker to your cause then you have to live with the consequences.  Understood :-)

If anyone can shine any light on the questions around using redis task queues and how a python script and dependencies could be referenced from separate containers through the queue then I would welcome your insight.  I am currently presuming that I would need the code in both containers, but not sure what happens if one were to get out of sync with the other, etc.

Thanks

Vance

Reply all
Reply to author
Forward
0 new messages