Not sure I'm going to assist in the discussion, rather add more questions. Thanks for posting your query and to those that have responded. My reflections below.
--
Interesting post by Rosalyn Metz and she gets to the nub of the problem quickly. Forces for private vs public interest in access are in tension. I've been trying to work out what's behind your question Thomas: "how to engage with tech companies who want to train on library, archive, and museum collections".
So I backed up and asked: why engage, WIIFM (or the organisation I work for and those who own or are reflected in the material)? and what is it about "tech companies" that causes the concern (profit is a known and reasonable outcome from industry, but it can be a diabolical motivator)? Then asked another question: what would breach the social licence a GLAM organisation has as a holder of knowledge and/or heritage? What lessons from history etc etc.
No simple answers to offer; rather more questions: what have we learned from the tragedy of the commons?, and what does it cost us to jeopardise rights over research and heritage collections (knowledge and information)?. I arrive at a loss of trust and understand that a rights owner might be much more cautious about licensing their works for publishing online. The ultimate result will be an inhibiting effect (knowledge needs to flow but how quickly and to whom and why).
Where do I land? I have accepted that as a community we will need to look at alternatives and establish gates and new access models to be able to protect interests and trust third parties to operate ethically in an open environment (private or public). Gates and alternate models and pathways still make resources available, the extra step is the tempering of availability and negotiation that gets put in.
So, why should public interests dominate and prevail by being judicious in entering into agreements? My 2c: trust is a fundamental social institution (hard to establish, easy to lose).
Frankly, the conversation about expanding our commitments to mediated access, when material can still be made available, but in a gated or controlled arrangement isn't getting much airtime (or I'm missing out on it if so!). Yet, the mediated model is used heavily in research where matters of sensitivity whether personal or commercial come into play around data and software and has well established norms in heritage in a physical sense. The tech stack is all there to do this and it is very mature, so is a risk model (Five Safes Framework).
What's missing is the rationale and models as types for joint ventures and an acceptance of the need to negotiate and establish new norms for mediation. When working with researchers in eScience/Research to support the release of open data, some useful case studies emerged e.g., an annual survey of religious practices was released, but at such a high level to obscure identifying people (useful, protective); a small sample of images from a photographic collection to indicate the range of material (useful, protective).
We (at the National Museum of Australia) are commencing digitising a card catalogue with very sensitive First Nations information on it that comes with a very complex ethical backdrop and history. It is fairly safe to say the datasets will not ever be in the hands of a tech company. Same goes for the work with audio in film, tv, oral histories etc at the NFSA, a rich mix of public interest, commercial, and private opinion. I can only imagine none or only a very small portion of that being made available to tech companies, ever without negotiation. Maybe given this, there are substantive delineating factors that separate research and heritage collections for this reason. Not sure, maybe not in the end, both are important to have some level of openness for many good reasons and some level of mediation for many other good reasons. The question is then, how mediated and how open and how to negotiate and communicate that?
It is not as if we've had a perfect state of making publicly known all that is in a collection, let alone the collection itself. Many times I have learned about indexes and catalogues inside institutions that have never seen the light of day. We have history in this regard and it has not always been about lack of funds and will, it has also been about protecting interests and rights (or shame and inertia).
Welcome hearing others' thoughts on this front, I have literally just relayed this message to a tech company rep recently. That the context for copyright in Australia and Aotearoa New Zealand and legacies of colonialism are going to be defining features in this terrain, and I am very focused on: useful and protective, as matters of balance. So it comes back to: where's the harm and who is going to be harmed etc? Care ethics are written all over this space to interrogate and deliberate on.