What is an Archipelago Digital Object?

85 views
Skip to first unread message

Diego Pino

unread,
Aug 28, 2019, 11:28:56 AM8/28/19
to archipelago commons
Hello folks,

This post relevant in so many ways.

I was reading today this great post of the great Rosie Le Faive (UPEI) https://groups.google.com/d/msg/islandora/FOT2-wv-_jU/m3JhtMbtBQAJ and i thought it was relevant to discuss this in our Archipelago Commons Context, since the same/similar question will come up sooner than later. 

Preamble:

When we decided to build Archipelago a few many months ago we were totally convinced that one of our most urgent needs and commitments we needed to make, in this Drupal 8 Context, was to define a unique and flexible way of describing, storing and extracting metadata; a lot of it. And that meant writing and maintaining the code responsible of doing that.

A CMS like D8 is not too different from a DAM / Digital repository, but there are some peculiarities that make asset management and definition (needs, hopes) amongst those two classes a bit different.
In a CMS like D8, its all about formatted Content: hopefully simple to publish, normed and web prepared content. Content can be a blog post, an article, in general some sort of HTML representation with media, social interaction, etc. Its about publishing websites!

So, Drupal 8 is good at it. Very good at it. It allows you to have Content Types, and each one can have fields (Key, values like "Subject": "Digital Objects") attached that hold the data. its optimized to store many values for each field. And its optimized for having the following workflow:
1.- Define what fields (different types of things) you want and the cardinality (strong decision)
2.- Create a Content Type and attach the fields. Setup how it will look and how you will add the values (entry)
3.- Create new Content pieces (many) for each of those defined (there is a schema in place, means your database is set to receive data in that form/shape).

If the schema does not work for you, and you already have data (key difference) then you create a new Content Type, set all configs, formatters, displays, permissions etc and add new, similar, or dissimilar fields. Repeat.

This is great for Web native enabled content, e-commerce, products, posts, static pages. You normally don't need too many fields and to be honest the big chunk can always go into not structured fields, like a "body" in html or a "paragraph" (advanced d8 module).

Well, things change if you are trying to mimic/fit metadata schemas/ontology based data (XML or RDF) where the shape/structure can be modal, a bit deeper, nested , plastic and evolving (always emphasis on the evolving). But mostly, a lot. A lot of different values and not always all values even when talking about the same thing.

Months before writing the first pieces of code we found (red, researched and proven in practice) one of the limits of the D8 CMS approach on content applied to metadata needs. Its a fixed number (integer) limit. 60. 60 fields.
Happens (tech stuff, skip if not interested) that Drupal 8, when it "loads" a piece of content for display/edit does an SQL JOIN to fetch the content itself and all the values for each attached field (one database table per field). And when using the default MariaDB setup, there is a well defined limit of how big your join can be.


The real number is 61. But then you need to count the content itself as one extra. So in practice less than 61. There are other limits in place too, but this one seems like a rule you can only break, e.g by swapping MYSQL for PostgreSQL(see https://www.postgresql.org/docs/current/planner-optimizer.html) , still, those joins are a performance killer when too many. I would love to hear other people opinions/experience about this, but this seems like a limit Websites build on D8 will rarely hit, except in our context, when doing Metadata work like the one we all do in the GLAM world but also in bioscience, research data, etc.

So, our experience said that many times we would need to have more than 60 fields. And also, many times we would like to have more than just key(s) and value(s), we also like hierarchies (LoD, labels and URIS). And many times we would want to add on the fly/ or remove on the fly/ keys (which sadly is a burden when you already have content in Drupal). So we sketched and talked to people while walking one night under the big sky of Montana and finally we created this single Drupal field named Strawberryfield. A field that stores, processes, validates, exposes and uses JSON natively to store many types of data in deeper hierarchies. As John Lennon wrote: 

Let me take you down
'Cause I'm going to 
Strawberry Fields
Nothing is real // Because we expose to Drupal the internals of this hierarchical structure as single Keys and Values even when our metadata reality has deeper roots.
And nothing to get hung about // No worries! We got your loved metadata.
Strawberry Fields forever // Digital Preservation but also because we feel the solution will stay quite a long time around

Anyway, wonderful song and a side note: The Archipelago architecture was inspired by the song. Also related, https://en.wikipedia.org/wiki/Fragaria_chiloensis, from my dear country of origin, one the reasons you can enjoy modern Strawberries. 

And even if that data was shaped differently, and was small or large, it was still a single Strawberryfield. Which meant a single Field, a single JOIN and also no need to create new Content Types of accommodate different types of data. (You can of course, you just don't need to). Of course some people freaked out and some other felt we were almost breaking the un-mutable laws of D8, that is fine (also in the song). But D8, was made to be extended and we did so, in the D8/Symfony way. We are still convinced it was the right choice.

So, what is a Digital Object for us?

Back to the question in our Archipelago Context. What is a Digital Object for us? Any Content that bears a Strawberry field. You can name it whatever you want. We ship two, an all purpose Digital Object Content type and a Digital Collection one. We could have gotten away with just one, but its always good to be explicit for the sake of backwards compatibility and common places. Both Content Types have a Strawberryfield attached named "descriptive metadata" and that makes them Digital Objects and makes them act and react as digital objects. You can add a new content type, attach a strawberryfield and the wonderful machine of metadata mangling will work on that too.

And we added some logic that allows us to identify who carries our perception of metadata/DO.


There is a small simplistic service (cached) that checks if a given Content Type bears a Strawberry field. If so, we allow all the other logic to happen, like file deposit, full digital object deposit, events, JSON embedded data services etc. Its a simple approach, i can't state its fancy nor say its universal nor say its finished (will never be). It works for us and we feel it will work for you. It gives us a clear programatic differentiation between a Content made for the web (per se... not sure how to express it right, the rest of what Drupal provides) and a digital representation of something else expressed via metadata and attached/linked files. We totally encourage you to have both and mix them: normal drupal nodes for exhibit building, static pages, help, blog posts(love those) etc. and also ADOs (Archipelago Digital Objects? not sure about the name really) for describing complex metadata based entities.

Same service could then be invoked in other scenarios (like Rules, APIs, Views, etc).

You can access, reference and link to an individual object via http(s)://yoursite.io/do/uuid (not fancy neither but pretty close to a PURL), instead of via e.g node/6 which is the default D8 way.

Hope this clarifies what we see as a DO in our context. What is a Digital Object for you?

Thanks for reading this. I'm on vacations but today i can't seem to be able to escape my personal strawberry fields. I hope i did not repeat myself too much. Thanks to you Rosie also, your post is a good technical one for all of those using Drupal 8/9 but also a great philosophical read. Is data what makes a system? Or is its logic? or in between? Or neither? Maybe its the community and the ideas they all agree on? Lots to think about.

Best

Diego Pino
Metro.org


Reply all
Reply to author
Forward
0 new messages