Recommended practice for using Python-based command

55 views
Skip to first unread message

Arif Shaon

unread,
May 13, 2020, 3:50:09 PM5/13/20
to archivematica
Hello List,

I am trying to use a custom python script as a command for the "Validation step" in the Archivematica workflow. While this should be relatively straight forward according to the documentation, my script requires additional python libraries and I am not sure about the best approach to deploying them.  So far I could think of the following ways:

1. Install the libraries by importing pip within the script or running a subprocess to invoke "pip install". Neither of these two actually worked - "pip install" sub process returned a stderr while importing pip and doing something like pip.main(['install', "xmlschema"]) also threw an "attribute not found" mostly likely due to an incompatible pip version.

2. Create a standalone python tool using pyinstaller or something similar and run as a command line tool. This did work but required more effort that I had anticipated.

Is there any documentation in the Archivematica knowledge base that sheds light on the above?

Also, what is the best place to deploy the standalone tool? the Archivematica shared directory?

I would be interested in your thoughts on the above.

Many thanks in advance.

Best Regards
Arif 

Ross Spencer

unread,
May 14, 2020, 6:38:26 PM5/14/20
to archivematica
Hello Arif,

That's a really fascinating question. It sounds like you're a good-way there. I'll write from a generic perspective first for those on the list, and see if it takes you where you need. 

First, when you're writing a new command for the FPR, we know that we're asking Archivematica to run some tool that is going to run on the same server that you are running the MCP Client. So, if you wanted to garner output from the Linux file command, file must be available on the path.

In the case of Python we need to do the same, but what's often not so obvious is that Archivematica, as a Python application, is running in its own virtual environment (there is one for each component, including MCP Client). That means within the virtual environment we can see the path, but it can also only see the modules installed in there.

If you inspect the status of the MCP Client service, you can see where the the virtual environment is:

[artefactual@analyst-vm ~]$ service archivematica-mcp-client status
Redirecting to /bin/systemctl status archivematica-mcp-client.service
● archivematica-mcp-client.service - Archivematica MCP Client Service
   Loaded: loaded (/etc/systemd/system/archivematica-mcp-client.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-05-05 15:44:26 CEST; 1 weeks 2 days ago
 Main PID: 2964 (python)
   CGroup: /system.slice/archivematica-mcp-client.service
           ├─1307 /usr/share/archivematica/virtualenvs/archivematica-mcp-client/bin/python /usr/lib/archivematica/MCPClient/archivematicaClient.py
           └─2964 /usr/share/archivematica/virtualenvs/archivematica-mcp-client/bin/python /usr/lib/archivematica/MCPClient/archivematicaClient.py

So, for me, /usr/share/archivematica/virtualenvs/archivematica-mcp-client/bin/

We actually seem to only have the one script importing an additional module that's not in the standard Python library at the moment: Check against policy PLACEHOLDER_FOR_POLICY_FILE_NAME using MediaConch So it's not something we've done in scripts too much ourselves. 

So where does that leave you?

I think modifying the virtual environment in a production environment/prototyping etc. would be risky (though you can always back it up and restore it). It's a little easier to work with new binaries in this regard. But then, the environment is still being mutated somewhat. So I think I'd persist with this in your test environment only.

As developers, if we're taking a prototype script (Bash, or Python), from testing to production, then we would persist that during deployment. In source control we'd update the OSDeps (Operating System Dependencies) for bash-like scripts calling new binary utilities. For Python, we need to build the module into the setup for the virtual environment, so the Python requirements.txt. (In that last link you'll see our Mediaconch import right at the top: ammpc). Then when the `venv` is built, or rebuilt, the modules are available to you to run by the `pip install` commands used to stand up Archivematica. 

But yep, I'd consider steering away from running pip in a new script itself. Though it's interesting to hear that it failed. It's also a pretty cool that you tried it. I might give that a whirl sometime.

Maybe have a look at what it would take from the advice above to see that the dependency is already available to your environment when you start the client. 

i hope that helps. Let us know how it goes.
Happy hacking!
Ross

Arif Shaon

unread,
May 15, 2020, 10:29:43 AM5/15/20
to archivematica
Hi Ross,

Thanks very much for your reply. Most helpful indeed.

What I should have clarified in my original post is that I am using docker-compose to run Archivematica, which could be contributing to the failure of "pip install".

I will try with a "standard" deployment of Archivematica with mcp-client as a service in the host machine rather than a service within a docker container.

Thanks again for your help.

Best
Arif

Ross Spencer

unread,
May 15, 2020, 2:13:56 PM5/15/20
to archivematica
Ah, Arif, that might be even easier then! Providing that you are running our docker-compose environment, then as long as your dependencies are listed in that requirements file base.in. You should be almost good to go, but I need to clarify that there's an extra step;

In the .../am/compose/ folder there is a Makefile, you need to look for this:

user@artefactual:~/git/artefactual-labs/am/compose$ sudo make help
[sudo] password for user: 
bootstrap                      Full bootstrap.
bootstrap-dashboard-db         Bootstrap Dashboard (new database).
bootstrap-dashboard-frontend   Build front-end assets.
bootstrap-storage-service      Boostrap Storage Service (new database).
compile-am-requirements        Run pip-compile for Archivematica              <-- compile-am-requirements

That will pick up your changes from base.in and pin them for installation.

There's a piece of work in progress to improve the compose effort an make the compile-requirements piece work better. There's an open ticket against it which we're working on: https://github.com/archivematica/Issues/issues/1039 if you have any problems with compile-am-requirements please consult that for the various discussions/workarounds there. There are attached PRs that might provide a model to help too. 

Once the requirements are compiled, rebuild your container, and all should update with the dependency available to the FPR for you. 

If you're wrestling with compile-am-requirements then let us know and we can figure out a strategy.

Best,
Ross

Arif Shaon

unread,
May 16, 2020, 2:40:45 PM5/16/20
to archivematica
Hi Ross,

Thanks for your help again.

I understand that re-building the docker image to incorporate new scripts or processes is a viable option for development purposes. However, I am not sure if that is an efficient and sustainable approach for PRODUCTION environment.

For now, I have done what you suggested in your original email - deploy the custom python script along with its dependencies on the MCPClient machine and have it available on the path. Of course, this has required a standard, non-docker deployment of Archivematica but it works really well.


Best
Arif

Ross Spencer

unread,
May 19, 2020, 8:19:57 AM5/19/20
to archivematica
That's great. Thanks for the update Arif. I hear you too about the production environment. In general really. The FPR should be as flexible as possible to promote its exploration and use. It's an important piece to think about.

I'm glad you've got this up and running for now! :) 

Best,
Ross 
Reply all
Reply to author
Forward
0 new messages