Archipelago 1.5.0 Release Announcement

263 views

Skip to first unread message

dp...@metro.org

unread,

Jun 10, 2025, 7:37:30 PM6/10/25

to archipelago commons

Dearest all,

I hope this message finds you all in peace, content enough, in a reflective mindset (proper of the times and world we are living in), enjoying (maybe?) some sunnier weather (if you are in the northern hemisphere) or a mild winter if, like me, you live on the opposite side of the globe or crossed the equator towards the south recently.

As it is our tradition, just before a new Open Repositories Conference, we are ready to announce a new version of Archipelago Commons: 1.5.0, with an appropriate (and musically inspired) slogan “People got to be free” by Allison Sherrick.

It will be very hard for myself to provide you all with the same level of precision, politeness and attention to details that Allison provided on our previous (1.4.0) release, but I will do my very best to be at least informative and also this particular time (strange for me) as brief, as one that is never brief enough, can be.

First, facts:

Local deployment (please enjoy the new theme!) https://github.com/esmero/archipelago-deployment/tree/1.5.0

Live/Production Deployment (please enjoy the new theme with different colors here too!)

https://github.com/esmero/archipelago-deployment-live/tree/1.5.0

And our updated roadmap for 2025: https://github.com/esmero/archipelago-deployment/issues/285

Also documentation reflecting the latest branches (https://docs.archipelago.nyc/1.5.0/) which will have major updates during July, including new Twig template recipes and How-to entries.

All modules also had their proper releases either as 1.5.0 (strawberry field core modules) or 0.9.0 (AMI and Strawberry Runners)

And, after OR2025, Allison (will be presenting at OR2025, don't miss it!) and I will start sharing our team’s Summer/Autumn agenda for Workshops/showcases and tutorials related to 1.5.0. There is a lot that requires/deserves a show and tell.

Now to the less concrete/softer things:

This release cycle has been probably the longest since Archipelago’s conception (7 years ago - the idea- and 6 years and a bit more as a beta1 child of ours). We apologize for that, but also good things come to people that know how to wait (and test over and over code). Reasons are many, but i might blame (other than myself) the amount of effort/time the newer AI/Bot economy took from our team, as we tried (and somehow succeeded) to keep our/your well kept & groomed (and well intentioned) cultural heritage realm running 24/7, while the internet shifted into non-stop harvesting of everything-that-looked-like-data and thus your well curated subjects, facets and descriptions. Related to that topic we have some news (but you have to read to the end!)

Archipelago 1.5.0 brings an absurdly large amount of new features, bug fixes and future proof implementations, Drupal core update compatibilities and also configs, without introducing, as it is our leitmotif, any deprecations or required changes to your data or any major understandings of the system. Since the discrete change log spans over 12 pages, ,I will try to describe only a few of the most wonderful/fancy additions, which are evenly distributed across 17K new lines of code, 4K removed lines and touching (gently) 204 PHP/JS files.

Our Batch Ingest/Mass ingest module, AMI, lived a deep functional upgrade, introducing better background tools for very large spreadsheets (the CSV expander queue), a new plugin for EAD2002 imports with the capacity of not only ingesting (and transforming) XMLs but also syncing complex nested components on existing ADOs and nested CSVs. This new feature also allowed us to (proudly) introduce AMI set Actions. You can now, using your well loved AMI sets, patch, export, publish, delete, trigger post processing in the background without time or count limits. Allison and I (true/fact/wow) ingested once (false, twice) 600K ADOs using a single AMI set. Even reports got new love.

In this release we also are happy to introduce Search API background indexing. Ever ingested thousands of ADOs (we know you do that every day, AMI is a gem) and you did not know for sure if Solr was up to date yet? Had to wait longer for all to be findable? Hydroponics now can help/contribute during each wake-up cycle to Indexing. And it will be in a way that is memory/process aware, stopping until a new cycle, if half of the available memory was used, so other tasks like ingesting and actions over ADOs can keep running until finishing.

Probably the largest update happened in format strawberryfield, from updated and more Embargo options and settings (so many new ones, bots you had it coming!!) to 4! (yes, not only 3 but 4!) Mirador Versions, IIIF Content Search API with Metadata Search capabilities (not in the specs but needed) and Annotations/ML/OCR/Pure Text/XML highlights, new and better filters and user facing Twig extensions capable of loading, dynamically, Views and Annotations after the main PHP pages were delivered. Even the Internet Archive Bookreader got new code and (did you know?) can display video and audio too. We basically re-wrote the core Drupal Ajax and Facet backend and introduced new APIs (e.g. one that transforms miniOCR into annotations to also support ML detections).Oh. And you will love the new Date Range facets (with animated histograms) that tap into the obscure (and fast) world of Solr Facet JSON API.

The Webform Strawberry Field module now brings ORCID autocompletes, a new IR inspired Agent Composite field and data/validation before editing, effectively blocking from destructive edits JSON keys if the webform is set up to handle data differently (and letting you all know too). But probably your favorite feature will be Huge File Uploads without time-out or size restrictions. We implemented a complete TUS protocol allowing resumable/chunked uploads to happen. Ever needed to upload a few gigabytes of web archives without asking your devops friends to bump configs? Now you can.

Strawberry Runners have better time/timeout management and all our previous ML tools are better. We have interactive (hover and click) Image based similarity queries, a new ML model (ViT) and chained ML processing so one ML model can be used for segmentation while others generate single embeddings for each annotation from the former one.

Again. I am trying -hard - to be brief, so please check the fuller change logs and test our deployment strategies to learn more.

Also our server stack changed and evolved. And the changes are significant. We moved from php 8.1 to a very fast and custom PHP 8.3 (which required a review of code deprecations everywhere). Solr was upgraded to the latest of the latest, from 9.1 to 9.8.4 (fast/slimmer), Cantaloupe is new too and has our very own fixes, since we started contributing last year to its official development and maintenance, running Java 23, DBs are fresh, the NLP/ML (3 containers!) are also new and then we have Anubis. Anubis, a new member of the family (OSS/from Canada with Love) is an Application Firewall that requests to browsers a proof of work (cryptographic challenge) to deter/stop headless bots and/or make huge swarms of harvesting requests to give up because of the high costs (CPU & Memory) of solving the challenge + a rule system. We provide a custom (Dumpling our loved octopus shows a mad face if the challenge fails!) docker container and a customized/detailed Allow rule set to work well with NGINX and Drupal. The config itself is quite complex but well documented in the Archipelago Deployment Live installation guides for this release.

All this said, Archipelago is not code (so why release? oh.. just let me roll with feelings here). It is not Devops or Docker containers, not YAML files (even if this release brings 10K new lines of configs just for the local deployment), not even (meta)data or images or specs or APIs. Archipelago is a space that exists (and fits ideas/implementation/values) to sustain your and our efforts, your and our time, your learning and testing, your trust (a rare privilege), your care, your questions, your doubts and your/our solutions and that space (and the time/intersections/gravitational waves) allows a diverse group of people to respectfully co-exists as a community of peers. And it keeps going because it is being actively used, every day by all of you and also by us. The idea of an OSS community sometimes feels almost like a fictional construct, an anonymous cloud of people and institutions circling around tech, code that changes, evolves and deprecates, people that come and leave. But in our experience, so far, the code/the implementations and our efforts are equally orbiting you all. Nobody here is anonymous and we know each other well. It is a conversation. This mutual evolving/revolving (and also the choice of not) permeates to both realms supporting each other. We are grateful for that and we aspire to keep that delicate balance (like a garden) alive through care in the years to come.

These tiny notes would not be complete without thanking my Colleague Allison Sherrick for all her efforts, support - personal and community wise - tech/engineering and data labor, testing, twig templating, documenting, presenting, ingesting, discussing and suffering me on a daily basis (repeat/cleanse). Danke Allison. And of course I want to thank METRO: our executive director Nate and all our colleagues (with special mention to Anne and Shelly) and friends that support all our efforts and provide the conditions that made, are making and will continue making this OSS project a long term sustainable effort.

As with previous archipelago editions, we would also like to thank our community members and in particular contributors to Archipelago’s functionality/use cases and wider presence in this world, which as of today includes, just in our team's managed repositories, over 4.2 million digital objects and multiplied by dozens amount of OCRs, annotations and facets, languages, themes and colors:

Thanks to Johannes and the Bavarian State Library team, Giancarlo and Anna (since day one, Gracias), Mike, Megan and Roland, Scott, Lucy, Ruairí, Pat, Alessandro, Amy; Joanna and Corinne at Union College, Robert, Laura and the entire Washington DC Public Library repository team, Devan, Erik and Lisa and everyone at the SDSU repository team, all our friends and colleagues at the NY State Archive: Michelle, Andrew, Laura, Jasmine and Mario; Shay, Gisella and David at Hamilton College, Max and David at WWU, Sarah, Jennifer and Zack, Ilya and the Webrecorder team, Brenden and Jen at RPI, Cristy and Martha at Barnard, Carl at MIT, The hardworking archival team at Revs Institute and of course the IIIF team Glen and Caitlin, our dear friends in Mexico/UNAM and the wider Drupal community. Also thanks to everyone that is no longer around but added so much to what we have become. To everyone that I failed to individually or institutionally name (my apologies, i am a faulty human being): thank you, you are equally appreciated.

Have a good night. Adios.

in a pickle.jpeg

Diego

Reply all

Reply to author

Forward

0 new messages