Search implementation options

35 views
Skip to first unread message

Aaron Pinero

unread,
Apr 20, 2021, 3:33:55 PM4/20/21
to archipelago commons
I have a Drupal website hosted on Pantheon. I would like to build some of Archipelago's functionality onto it by utilizing the SBF modules. Right now, my organization is primarily interested in being able to import, store, and display EADs similar to what they do on https://www.empireadc.org

I see from documentation that the distributed Archipelago container includes Apache Solr for search. Our service tier with Pantheon doesn't include the Solr service, so I am wondering if Solr is required to be able to search within the SBF-stored metadata or can I use other search solutions with the Drupal Search API.

Thanks,
Aaron Pinero
Web Service Manager, Health Sciences Library
Columbia University Irving Medical Center

dp...@metro.org

unread,
Apr 20, 2021, 4:57:16 PM4/20/21
to archipelago commons

Hi Aaron, welcome to our group,

Happy you are testing Archipelago out for Finding AIDS / Archival descriptions. FYI The Empire ADC Team leading the pilot project using Archipelago and Finding aids have built some wonderful Webforms-Twig templates already that cover EAD2002 and EAD3 XML as output in its full extend (quite huge and complete, on our side we may need to do more work as the project goes into production) and their work is meant to be shared and make available for the community. We want to take part of that work (with their consent) and make it part of our next release. I'm also using this as a chance of doing some Heads up to the great and caring work on Jen and Zack from Empire EADC =)

Regarding Drupal and Pantheon. We have no empirical experience running the Architecture under a fully hosted Drupal company like Pantheon and I would ask you, once you start exploring, if you could share with us the complexities of doing so and what we can do, code wise, to ease the process.

SBF modules expose the Internals of the JSON metadata via configurable Plugins named JSON Key Name providers as native Drupal Field Properties. Each of this plugins can query the JSON and expose pieces of the complex values as "flat" single or lists of "flat" values to Drupal mimic-ing the way normally Drupal fields work and allowing the Search API to see the data/values natively. This,  in theory, would allow the Database based Search API implementation index those in its own DB Tables and allow faceting, views integration, etc. That is the promise of the Search API basically, to allow a full swap of backends (Solr, Elasticsearch). That said, I know the DB implementation has many limitations and i would discourage you (for a production server) from going that route. Even the Drupal Search API documentation has a note about that


"Database search
(Drupal 8 only)
This module provides a ready-to-use search backend that indexes and searches content using Drupal's own database. It is mainly meant for testing purposes and for smaller sites, larger sites will usually want to use a more powerful backend (like Solr or Elasticsearch). Also provided is the Database Search Defaults module which provides a complete pre-configured content search when installed."

That is one of the reasons we also did not went for the traditional many Bundles too many fields approach and created SBF instead. It simply does not scale very well and the database can become a bottle neck in your system pretty fast if the number of assets you have are many or discovery is hit hard by traffic. Also not every accompanying search_api (e.g facets) will behave the same way when using the DB v/s a better suited for search backend. 

Our Solr Docker container (its the official one, means no custom code) running Solr 8.7 or 8.8 can index millions of nodes and deliver realtime autocompletes, suggestions and search.  The only customization we deliver is a special plugin for HOCR highlights built by the dev team of the Bavarian State Library and the corresponding Solr Schema/ YAML files suitable to be used in Drupal (you can see its in the persistent folder and sync folders of the archipelago-deployment GitHub repo). We also provide the IIIF Cantaloupe server for media which will also not be available in Pantheon but for Finding AIDS you won't need it, PDFs and CSV will still be able to be delivered as files by Archipelago.

If you have other solutions available in Pantheon that are not DB (elastic search?) that could be a better option as you move into production or even larger tests. But also, only as a suggestion, you could run in any cloud provider (AWS, Azure, google) or via some internal Columbia's IT infra a very small server with 4 Gbytes of RAM and only the Solr Docker Container + the configs we provide and let Search API/Search API Solr module connect remotely from Pantheon. Search API Solr can use any remote Server URL and can also authenticate if you enable that option. The cost would be on the lower side and you could grow only if you need so but you would get so many benefits. Solr is one of the least troublesome services and will rarely fail in a disastrous way.

Sadly I do not have more experience with Pantheon, I only know that many of their packages are custom replacement for the vanilla drupal ones. 

Please let us know if you need any help of have any other questions (any time) as you start your exploration and your journey working with Archipelago and Strawberryfields, very happy to help. We also have a slack channel in case you have pressing things to ask and discuss.

best

Diego Pino
Archipelago Architect/ Assistant Director for Digital Strategy
Metropolitan New York Library Council
Reply all
Reply to author
Forward
0 new messages