Archipelago 1.0.0-RC3 is released! Please rejoice

111 views
Skip to first unread message

dp...@metro.org

unread,
Nov 30, 2021, 2:51:55 PM11/30/21
to archipelago commons
Dearest people and inhabitants of cozy islands and remote archipelagoes.

It took us some time and effort to get to this stage. 1.0.0-RC3 (the semantic version) is just a short tiny and shiny passage that leads automatically to 1.0.0 without any new features to be added. And we reached this point today.

The process that comes now for the next 2 weeks is public consumption and sharing this with you to get feedback/comments and suggestions, maybe finding a few bugs that escaped our vigilant watch or changing a color here or there. But this is finally done. It has been 3 years since we went open in GitHub with our code and no one to use it yet (wow) and since then the community, the early implementers, the use cases and why not, the love and trust for Archipelago has grown immensely. But also the promise of a system that is stable, does not leave anyone behind and can grow with your own knowledge, workflows, will to explore and removes cataloging and metadata assumptions from Code base has been kept. Last release was 6 months ago, and in 6 months 1.1.0 will be out again (May 2022). We think release every 6 months is fair and releasing often is good for DevOps but for humans/practitioners not so wonderful.

I have to say 2021 has been an emotional rollercoaster, not an easy year, isolation, loss and sometimes even despair for many in their personal and work lives, and keeping focus on Code/Release/DevOps has been hard, but also a way of having a larger purpose thinking of all the amazing use cases, your beautiful and important Assets and Digital Objects you have brought and shared with us and also the hope (in your eyes) we can feel in each Zoom call when we share how Archipelago approaches the idea of a Digital Objects repository. If you are one of those hopeful people, this is for you. Thank you! We are so grateful. If not, maybe this is a chance to give this a try and built your own idea of what a system like this, open, flexible can do for you. 

This has been also a community nurturing, patience and watching it slowly grow process Communities can not be forced to shapes. Some are more silent, other a bit more fearful, others outspoken, some share needs, others share their diversity, sometimes a bit unbalanced sometimes cute and tiny too. Ours is what it is, one of hard working people, practitioners of our diverse field that are finding their place, their voices in a sometimes too technical space too many times driven by technocracy and one important aspect, here where everyone is welcomed, you never pay to play, never silenced and your opinion matters and feeds us all.

Now to the concrete:

Local Deployment Machine (its also the default branch)

Same as before. Documentation has been updated and this is now a Drupal 9 only release. There is also an Upgrade to RC3 from D8 document we dearly invite you to try out.https://github.com/esmero/archipelago-deployment/blob/1.0.0-RC3/docs/upgradeFromD8ToD9.md (look for English typos, not my first language as you know)

Production Deployment Machine (its also the default branch)

Which brings Cloud enabled with SSL certs, S3 storage, all the fancy stuff you need to run your Large or Small archipelagos super fast and reliable on the internet.

So you want to know what is new? Gosh people so much! Let's start.

New Features List:

StrawberryField:
- Technical metadata now reacts to number of Files attached to a Digital Object. If you (e.g we have done this) ingest a Single Object with 300 Images, Archipelago will generate a reduced set of EXIF etc to make your JSON snappier. This has been tested with the largest JSON Objects you can imagine and all works perfectly.
- Option to remove temporary files immediately after ingest to make your space usage slim.
- Options for where in your S3 buckets Dumps of Digital Objects and Actual Binaries go (prefixes) and Developer only hooks to modify all this with Metadata awareness.
- Alto Support on HOCR exposed Solr Endpoints
- Smart Breadcrumbs with parent-ship aware caching (this will lead to some amazing ACL work in the future. Why Smart? Because if an Object belongs to multiple parents, you can either choose globally to use the Longest Path or (amazing) the most representative one (which will take the sub paths are more in common across all hierarchies).
- Lots of refactors on Symfony Services to make all more reliable and faster.
- Json Key Name providers (plugins that take selectively your data from the JSON and expose them to Drupal/Solr) have been improved but most important you have now a D3.JS driven graph that shows all in a single screen. Your fields, your properties, your Solr Fields that are fed by those and even the facets with data simulation (you pass a node). This is just huge because it tells the story of your data and now from RAW to exposed all flows.
- Media Info extraction for Video and Audio (all details, frame reates, compression codecs, etc)
- New Data Structure (Solr) For Strawberry Flavors which includes all the works around NLP (sentiment, agents, etc). This applies to OCR and WebPage Text extraction (WACZ)

Format Strawberry Field:
- Every Formatter was refactored
- New Settings to select from Which JSON keys to fetch Media. This allows you to fine tune exactly how each Formatter, e.g Video, will source its files.
- Embargoes! Metadata Driven Embargo using expiration date and also IPv4 Addresses and Ranges. This includes a special caching mechanism that via cron invalidates caches periodically based on past (possible if there) embargo dates always showing fresh renders but also provides specialized alternative keys for media to be shown when embargoed (means you can have a HUGE PDF and also a small with watermarks that is only shown when embargo is applied). Includes a special "Bypass" embargo permissions that can be used for Drupal user Roles that should be able to see any Object. Embargoes also pass Context data to Twig Templates and are applied via Service so calculation happens only once per URL even if multiple Formatters, Views, etc are in place and of course is inherited by File Download URLs. Be ship this disabled by default
- 3D formatter now is able to read OBJ, MTL and all associated Textures in the MTL to generate a full shaded/UV textured Model. Also allows to download a screenshot of the current render and better max/min zoom.
- Video/Audio with complete subtitles support. Files are grouped by JSON key upload.
- New Twig Extension including json decode and better caching for our Metadata Display Entities
- Metadata display (twig) preview got revamped and ships now with the public theme, so you can preview exactly the same way public users will see it
- Every JS library was updated
- Annotorious got a big update and now allows polygon editing amongst many other improvements in UI/XU
- Panoramas and tours have now better webgl max sizing strategy and works well on Phones there max render texture size is limited. Also updated to work better on D9
- Mirador 3 got a bit override from our part to allow multiple videos and audios to work at the same time, even across multiple workspaces. Mixed Manifest work great now
-replayweb also was updated and allows deep linking and Solr/Global Search to Viewer connection
- Many PDFs? All can get grouped now under a single PDF viewer that will bring a Select box with dynamic update of PDF.js. So good!
- Large Update on how Metadata display's are connected to formatters/views. Instead of using the ID we use now the UUID (with a special Entity Autocomplete that can deal with that, new to Drupal AND IN-HOUSE development). This means that the order of ingest does not matter and your own provided Twig templates will never be overwritten by the ones we ship.
- Contextual Menu's for direct access to Metadata Displays? Do not know what Twig template is driving your mirador? Press the tiny "Pen" icon on top of the formatter and you will see a list where you can go directly to edit the template. This adds a LOT of context/Twig importance awareness to the environment.
-Markdown to HTML twig extension! No HTML in your metadata, you do not need it. Use simplified Markdown and you can transform it on the fly on your templates to HTML. Clean and cool.
- Minor fixes and updates / bug fixes

 Webform Strawberryfield:
- New LoD elements and endpoints. MeSH and SNAC plus Europeana API enabled by default.
- Better caching and routing of LoD Endpoints and respect for result count (even if the source API does not allow it)
- Autosave of sessions is smarter. Does not dissapear if editing/creating new at the same time and allows also for clearing the previous session + better reporting of steps and errors. In general improvements on the SBF Webform handler to allow also Out of Edit-Create NODE ingest of ADOs (e.g self deposit forms that are public)
- Range access to binaries was fixed to work also for local filesystem (when not using S3)
- Lots of Fixes and very strange Formstate/JS improvements

AMI:
- Islandora/Solr Import with extra capabilities, offset, number of objects and also clean up! All non existing Columns (empty) get removed after the Batch based fetch from Solr.
- More Options during "Process"/ Ingest. You can ask AMI to process attached files/remote fetched files to be processed in its own queue item, allow almost infinite size of HUGE complex files to run without any timeouts. AMI updates the queue as it goes fetching and making sure files exist before ingesting the ADOs
- LoD reconciliation for AMI sets and also CSV based (with Webform Elements) correction of automatic reconiciliated Values. Includes inline correcting or even downloading and uploading the CSV. All LoD elements are then Injected into the AMI to JSON templating so you can use them directly when generating your JSONs 
- Remote Files and TECHMD are cached. So if you ingest, delete and reingest, second pass will be super fast. You can always force it to re-calculate if you do not trust your remote source to be stable.
- Webform based Search and replace. Was probably the most complex part. You can select a Webform, an element and use that to find/replace batch values across a set of selected objects. This adds to the existing JSON as Text and JSON Patch capabilities. All have also Simulation mode.
- CSV export. Get all your data as CSVs that can be directly used to reingest/update or even move to another repo. It also creates an AMI set for you if you want to.
- Better remote file download, more checks on metadata, you can name your AMI sets during setup, remote ZIP files and CSVs (s3) also work.

basically a full refactor of all, AMI is probably a 60% of all Archipelago's code.

Strawberry Runners:
- Improvements on how things are processed and cleanups.
- More options for NLP for OCR/WebPage Extraction\
- WACZ file index extraction and Solr indexing.
- Bug fixes, Alto output (settings there but we might need to test more)

Deployment and looks:
- Base theme was improved
- All lPHP ibraries to latest (except bootstrap barrio, we want to avoid a big break there sorry)
- Drupal 9 latest
- JS libraries via Composer! (from npm or bower!)
- M1/ARM64 support everywhere with custom built Containers
- Upgrade path from D8 to D9 documented and tested
- New settings on Docker Compose files for both local and live instances
- New Twig templates, every existing one was fixed, improved, made better. IIIF got a lot of love so Video/Audio, PDF and images all at the same time work flawless, MODS 3.7, better AMI Ingest JSON template, Object Descriptions and Abstract with Markdown
- New AMI set with more demo objects
- New Views for Searching pages and WebArchive Content with NLP facets
- New Solr OCR plugin (0.7.1)
- New Solr Schemas
- New Minio with Console.

New Documentation page: https://docs.archipelago.nyc

Ok, we could go and go. There is way more behind the scenes and Archipelago can basically take almost every use case we could gather. Its fast, performant and safe and fun to use and extend. 

Only left to say is thanks to our team, Allison and Albert for all their amazing and encouraging work, so much talent, initiative, good ideas, templating, documenting, bug finding, fixing and testing and patience, so much patience. You both are amazing. Giancarlo best bug finder ever and dear team friend, Pat all your understanding with my code reviews and your great code additions and use cases to Archipelago, Derek for all the testing and suggestions, Mike for fixing the Minio Console,  Don for your Documentation bug fixes and your tremendous contribution building tooling for command line ingests! Carl for asking, interacting and feeding your dome project with wonderful additions, Megan for your exif/mediainfo use cases your work on adapting to Archipelago and integrating your media workflows, Megan Tyne for all your encouragement and interactions building your own solution, Chuck for the too many deployments and good will on learning, Jen, Tammy, Liz, Brenden, George and team for being such good early and independent adopters, Jennifer and Zack at senylrc for all that wonderful caring work on EADC and the many use cases that have driven a lot of the new Webform functionality,ESIE project one of our first implementers, thank you,  Lisa, Shay and team for allowing us to be part of your migration path and future needs, Rainer Simon and Johannes Baiter, your hard work has a special place (core to it) in Archipelago. @adolski for all your talent and perseverance building Cantaloupe, Nate (Metro's director) for advocating, speaking, working and telling the world about our work, your part in this team is invaluable, Advisory board at large for all your governance suggestions and help figuring out the future of our community (and ideas about ARK ids), Ilya, Lorena and Emma for all the amazing times we had working together with replay web and the web recorder team and to every one in the community testing, using, breaking, fixing and improving on Archipelago and helping each other. If I forget someone, please forgive me, not my intention for sure!

Thanks and happy exploring and please let us know if you find any issues, here to help.

Diego Pino

PS: Github might see some movement but rest assure all shared links/docs can be used right now.
PS2: lot's of typos. Sorry!

Diego Pino

unread,
Nov 30, 2021, 3:34:59 PM11/30/21
to archipelago commons
Oh sorry forgot 2 important features
AMI sets can be also previewed and tested against a twig template, either using a row number or the title of the Object to be ingested 
And hydroponics can be set to live for whatever time you want for postprocessing (in 60 seconds increments) or to “do all what is pending” mode


Now gone for good! Thanks!

--
You received this message because you are subscribed to the Google Groups "archipelago commons" group.
To unsubscribe from this group and stop receiving emails from it, send an email to archipelago-com...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/archipelago-commons/ec61c3e7-8c08-46da-b859-496808b54230n%40googlegroups.com.
--
Diego Pino Navarro
Digital Repositories Developer
Metropolitan New York Library Council (METRO)

Nate Hill

unread,
Dec 1, 2021, 9:24:31 AM12/1/21
to Diego Pino, archipelago commons
Diego, the METRO team, and the entire Archipelago community,
Congrats and thank you as we have reached this exciting milestone: 1.0.0!

Years ago, when we first started this work it felt risky. Could we really take on a project like this, and grow and sustain a new open source software community? I'm so grateful to everyone that we have persevered and reached this point. It is super exciting.
For those on the list who do not know, METRO is a nonprofit chartered by the New York State Board of Regents founded way back in the 1960s. We receive most of our funding through the New York State Department of Cultural Education (and really our State Library). While our primary responsibility is supporting libraries and cultural institutions in our region and our state, we are pleased to take on big projects (like Archipelago) that benefit the entire field. Only by working and participating both nationally and internationally can we tackle the biggest challenges that our local institutions face. I say all of this because I want to emphasize our commitment and sustainability plan for Archipelago. Every five years, METRO submits a plan of service to our state library. In that plan of service, we lay out our goals and our priorities. In the 5 year plan submitted just last year, I clearly committed to the growth and support of the Archipelago software and community. Furthermore, you should all know that we don't measure our success based on revenue, we measure our success based on adoption and use. METRO's Archipelago team and all of the maintenance work that goes into Archipelago are accounted for in METRO's annual operating budget. Rest assured that if you think Archipelago is the right choice for your work, METRO will be here seeing to it that the project is both maintained and always pushing new boundaries.

Finally, I'll just hint that this is not the last project of this type that we will be taking on at METRO :) We really believe that libraries and cultural institutions are at their best when they invest in their own knowledge and when they build and create their own tools to do their work. I'm always excited to receive emails from folks who have big ideas and who want to roll up their sleeves and get to work and solve problems. Don't be strangers! Whether you live in Brooklyn or Brazil, it is always good to connect and work toward a better (and open!) future.

Nate



--
Nate Hill
Executive Director
Metropolitan New York Library Council
Reply all
Reply to author
Forward
0 new messages