barman-cloud-backup issue: Missing required python module: boto3

348 views
Skip to first unread message

SteveB

unread,
Jun 8, 2022, 4:31:27 PM6/8/22
to Barman, Backup and Recovery Manager for PostgreSQL
We are having boto3 issue when running barman-cli-cloud in the environment described below.

Environment:
We installed Barman on the same host as the PostgreSQL database (under its own "barman" operating system account and with dedicated storage for backups separate from the database storage). Barman was installed using RPM packages.  We also shared SSH credentials for bi-directional password-less connections between the postgres and barman accounts on the same host.

This permits us to operate Barman as if it is remote from the database, but it keeps all communication local over the host LAN.  We have been testing for the past three months -- taking daily full backups and WAL file backups (streaming and shipping) with this topology in a test environment with great success.

Issue:
We are now trying to implement barman-cli-cloud with hook scripts to relay backup copies to AWS S3 cloud storage (Cloudian), but it seems we are hitting an issue with software installation/configuration.  barman-cli-cloud was also installed using RPM packages on the same host.  When we test barman-cloud-backup, it gives a message that the boto3 python module is missing:

  barman-cloud-backup -P cloudian -t s3://mybucket/  pg_test_db
  Missing required python module: boto3

We have an AWS S3 compliant platform that is set up with existing buckets and we can run aws client commands from the host (using the "barman" account) to view the buckets.

For Python, we installed miniconda3 and activated a Python 3.9 virtual environment under the "barman" account, and also ran pip install for boto3 (boto3-1.24.1-py3) in the virtual environment.  However, it made no difference when we ran the barman-cloud-backup test again.

Has anyone encountered an issue like this?  Should we be approaching the barman-cli-cloud, Python or boto3 install/config differently?  We would prefer to keep the overall "virtual remote" topology since the database is in a distributed environment (no data center), unless that proves is a bad choice for some yet-unknown reason.

Note, there are two other Python installations (system-wide, non-virtual) on the host.  We used the virtual environment for barman to avoid possibly impacting the Postgres system-wide software configuration.

Thanks in advance for any suggestions,
Stephen

Ravi Chauhan

unread,
Jun 9, 2022, 2:56:06 AM6/9/22
to Barman, Backup and Recovery Manager for PostgreSQL
Hi Steve,

Can you also share which OS, Postgres and Barman is running?

As you mentioned RPM, I am assuming it is RHEL. I ask you to have a look at https://groups.google.com/g/pgbarman/c/3pzjnFbb4r4, where I faced the same problem with AZURE storage.

Ravi

Michael Wallace

unread,
Jun 9, 2022, 6:23:47 AM6/9/22
to pgba...@googlegroups.com
To add to Ravi's response, the barman cloud scripts installed by the RPM package are going to be looking in the system python installation for boto3 so installing boto3 into a miniconda3 environment is not going to help.

If you did want to use a virtualenv then you would need to install barman via pip into that virtualenv - I'd recommend resolving the issue with the RPM-installed barman-cli-cloud instead, but if you did want to try `pip install barman` into the miniconda3 virtualenv then you would also need to have the libpq dev package and gcc available on your system (`yum install libpq5-devel gcc` depending on your OS) so that pip can build psycopg2 during the install process.

Hope this helps,

Mike

--
--
You received this message because you are subscribed to the "Barman for PostgreSQL" group.
To post to this group, send email to pgba...@googlegroups.com
To unsubscribe from this group, send email to
pgbarman+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/pgbarman?hl=en?hl=en-GB

---
You received this message because you are subscribed to the Google Groups "Barman, Backup and Recovery Manager for PostgreSQL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pgbarman+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/pgbarman/cac4b478-0cb9-4426-863f-cea3a712bb61n%40googlegroups.com.

SteveB

unread,
Jun 9, 2022, 3:19:18 PM6/9/22
to Barman, Backup and Recovery Manager for PostgreSQL
Thank you Ravi and Mike.

Sorry I forgot to include the version information in my original post.  Here are the specifics:

  OS                 Red Hat Enterprise Linux Server 7.9 (Maipo)
  Postgres      postgres (PostgreSQL) 11.15
  Barman        2.19 Barman by EnterpriseDB (www.enterprisedb.com)

barman-cli-cloud is also at 2.19.

We are open to resolving the issue by using the RPM-installed barman-cli-cloud instead of a virtual env.  If it can be implemented with no impact to the existing Postgres database operation, that would be the ideal scenario.  Eventually we will be rolling out the solution to multiple production sites via Ansible playbooks as we transition to Barman from BART.

I will consult with our Linux sys admins to see if we can get it working without the virtual environment, and will report back here.

Best regards,
Steve

Michael Wallace

unread,
Jun 9, 2022, 5:07:07 PM6/9/22
to pgba...@googlegroups.com
Hi Steve,

Unfortunately on CentOS 7 the python-boto3 package provided in the base repo will not work with all the barman cloud features (tags are not supported for example, which are used to implement the keep-backup functionality), so my earlier advice to resolve this at the OS package level doesn't apply here. I think your options are to either use pip to install boto3 into the system python (which, from what you describe, is not really a suitable option here) or to install barman into your virtualenv via pip (the PyPI package includes the barman-cloud scripts).

Best regards,

Mike

SteveB

unread,
Aug 9, 2022, 12:00:04 PM8/9/22
to Barman, Backup and Recovery Manager for PostgreSQL
After a long break we are now back to working on this.  We created a shared virtual environment (miniconda3, barman 3.0.1, boto3) under the /usr/local/share/.venv directory, which gives the  flexibility of using the environment from multiple operating system accounts (barman and postgres) without disturbing the system python setup.  We did install libpq5-devel and gcc at the system level, as suggested, and removed the system installation of barman.  Note, when we removed barman from the system, it also removed the barman system cron configuration, but the barman operating system account and its files were not affected.

We successfully implemented hook scripts to forward Barman backups and WAL archive files to S3 storage.  We created a shell script to execute the "barman cron" command (it first needs to activate the shared virtual environment).

Thus far we have not tried any retrieve operations from S3, but it is a progress.  Thanks to everyone for your help getting us to this point!


Not sure if this should be for the same thread, but it is the same environment, so I will start here ...

The only unusual thing we now seeing are these messages in the database log file during WAL switches:

bash: barman: command not found
ERROR: Error executing ssh: [Errno 32] Broken pipe
Exception ignored in: <function _Stream.__del__ at 0x7f6c5105b700>
Traceback (most recent call last):
  File "/usr/local/share/miniconda3/lib/python3.9/tarfile.py", line 410, in __del__
    self.close()
  File "/usr/local/share/miniconda3/lib/python3.9/tarfile.py", line 460, in close
    self.fileobj.write(self.buf)
ValueError: write to closed file

It occurs at every log switch and does not seem related to cloud communication, as it happens even when there are no defined hook scripts.

The database archive_command parameter is:   /data/tepgbarman02/barman_wal_archive.sh %p %f

The shell script runs the barman-wal-archive command, similar to this example:

     barman-wal-archive  db_host1.mydomain.com  tepgbarman02  $1

It throws the ignored exception and everything appears to continue normally.  
We observe the WAL file in question is closed and it sits in the streaming directory until the next successful execution of "barman cron" moves it to the Barman wals directory.
We can also see a new .partial file is opened in the streaming directory as soon as there is database activity.

Does anyone know what might be causing the messages or how to troubleshoot this?

Thanks,
Steve

Michael Wallace

unread,
Aug 9, 2022, 12:37:42 PM8/9/22
to pgba...@googlegroups.com
Glad to hear you made some progress!

Regarding the exception, the `barman-wal-archive` command runs `barman put-wal` over ssh, so the "bash: barman: command not found" message suggests that the virtualenv needs to be sourced in the environment in which the remote `barman put-wal` command is executed. You could potentially source the virtualenv in the ~/.bashrc file for the barman user.

It's odd that the archive_command is raising in an exception but everything is continuing normally - archive_command failing should cause PostgreSQL to retry. One possibility is that the `barman_wal_archive.sh` wrapper is not passing the exit status of the `barman-wal-archive` command back to PostgreSQL, however I will try to reproduce this scenario myself and report back.

Since you are using WAL streaming in addition to archive_command you could bypass this issue by setting `archive_mode = off` in the PostgreSQL configuration and `archiver = off` in the barman configuration. This would mean Barman would rely only on WAL streaming via replication slots in order to receive the WALs in the `streaming` directory (by contrast, barman-wal-archive writes WALs into the `incoming` directory) - `barman cron` will copy WALs from either location and trigger the hook scripts. As long as `streaming_archiver = on` in the barman configuration and replication slots are configured then you should be able to rely only on WAL streaming and still have your hook scripts write the WALs to S3.

Hope this helps,

Mike

SteveB

unread,
Aug 9, 2022, 4:12:09 PM8/9/22
to Barman, Backup and Recovery Manager for PostgreSQL
Thanks, Mike.

You were right about sourcing the virtualenv in .bashrc for the barman user.  The issue stopped as soon as I updated .bashrc to activate the virtualenv.  I had been activating it in .bash_profile, but seemingly that was not working as planned.  Thank you for the advice.

Regarding the suggestion about setting archive_mode = off and archive = off (using WAL streaming with replications slots, etc), I like the idea, but suspect it would cause the archive_timeout setting to be ignored.  We were planning on setting archive_timeout to 1800, which in combination with the Barman hook scripts, would ensure the oldest offsite WAL file would be no more that 30 minutes old (roughly).  Does that sound accurate?

Steve

Michael Wallace

unread,
Aug 11, 2022, 6:43:39 AM8/11/22
to pgba...@googlegroups.com
Hi Steve.

Your plan regarding archive_timeout sounds ok to me - the only reason the offsite WALs would be older than the value of archive_timeout (if everything is working correctly) would be if there was no database activity during that time.

Best,

Mike

Edilberto silva

unread,
Aug 11, 2022, 8:45:38 AM8/11/22
to pgba...@googlegroups.com
Meu nome é Edilberto
1 - Good morning my friend. I have a doubt. backing up with the bartender I can still backup with Pg_dump which is what I already do. Or I can just use the bartender.

2 - Not using banman. how would i do full and incremantal backup in postgresql?

Luca Ferrari

unread,
Aug 11, 2022, 8:55:02 AM8/11/22
to pgba...@googlegroups.com
On Thu, Aug 11, 2022 at 2:45 PM Edilberto silva <edit...@gmail.com> wrote:
> 1 - Good morning my friend. I have a doubt. backing up with the bartender I can still backup with Pg_dump which is what I already do. Or I can just use the bartender.
>

What is bartender?
In any case, no matter which backup solution you use, you will always
be able to use (and abuse) pg_dump.

> 2 - Not using banman. how would i do full and incremantal backup in postgresql?

The PostgreSQL documentation explains it very well. But I strongly
suggest to use already built tools, like barman, pgbackrest, and so
on.
But you can see
<https://www.postgresql.org/docs/14/continuous-archiving.html> and
build your own custom way of doing backups.

Luca

Luca

Edilberto silva

unread,
Aug 11, 2022, 8:58:20 AM8/11/22
to pgba...@googlegroups.com
1 - Good morning my friend. I have a doubt. backing up with the bartender I can still backup with Pg_dump which is what I already do. Or I can just use the bartender.

2 - Not using banman. how would i do full and incremantal backup in postgresql?

Em qui., 11 de ago. de 2022 às 07:43, Michael Wallace <michael...@enterprisedb.com> escreveu:
Reply all
Reply to author
Forward
0 new messages