Kick-Off Meeting of the DDI Qualitative Data Working Group

16 views
Skip to first unread message

Noemi Betancort

unread,
Sep 12, 2025, 10:04:58 AM9/12/25
to DDI Users Group ICPSR
Dear DDI Community and members of the Qualitative Data Working Group,

You recently received an invitation to the first meeting of the new Metadata Publication and Access Working Group (MAPWG) by Knut Wenzig.

Today, I would like to extend a similar invitation to you, this time for the kick-off meeting of the newly established Qualitative Data Working Group (QDWG), which was also recently approved by the DDI Scientific Board.

đź“… Tuesday, September 30, 2025
đź•“ 4:00 p.m. CEST (Berlin, UTC+2)
📍 via Zoom https://uni-bremen.zoom-x.de/j/63284701305?pwd=EehzWI52hcarxNMJj3EeFl0wmtuEKb.1


You can find a description of our group, its purpose, and its current members, as well as our Terms of Reference, using the following link: https://ddi-alliance.atlassian.net/wiki/spaces/DDI4/pages/3672113154/Qualitative+Data+Working+Group+QDWG

Preliminary agenda:
  • Group introduction: Background, purpose, objectives, deliverables and initial members
  • Organizational matters: Chair and Co-Chair nominations, communication channels and meeting frequency
  • Participation/collaboration with other DDI (and external) working groups
  • Discussion about which topics interest you and establishing priorities
  • EDDI25
I’m looking forward to launching this project and welcoming everyone's collaboration.

You are invited to attend the inaugural meeting, where you can find out more about the group and decide if you would like to become a member and contribute to our goals. 
Your support would be greatly appreciated.


Best wishes,
Noemi (interim Chair)

--
Noemi Betancort Cabrera
Data and systems librarian - QualidataNet metadata manager

Staats- und Universitätsbibliothek Bremen
Digitale Dienste
BibliothekstraĂźe 9
28359 Bremen

Tel. 0421/218-59592
Fax. 0421/218-98 59592

Hoyle, Larry

unread,
Feb 13, 2026, 8:08:57 AM (4 days ago) Feb 13
to Noemi Betancort, DDI Users Group ICPSR

 

For those of you interested in qualitative data,   the New York times article “How The Times Is Digging Into Millions of Pages of Epstein Files” is an example of analyzing a huge corpus using AI and human tools. The corpus includes “three million pages, 180,000 images and 2,000 videos”. It’s online, posted in a way that makes retrieval complicated (not surprising).

 

Here is a link, although it might be paywalled.

https://www.nytimes.com/2026/02/12/insider/jeffrey-epstein-files-documents.html

 

The kind of metadata might facilitate working with a huge trove like this is worth considering.

 

Larry

Noemi Betancort

unread,
Feb 16, 2026, 9:08:01 AM (yesterday) Feb 16
to Hoyle, Larry, DDI Users Group ICPSR

Thanks for sharing, Larry!

Access to such large data sets is now more common, which has made qualitative data analysis substantially more arduous, given all the previous steps you have to take to prepare the materials for analysis. In the article we see how many of these steps can be done by leveraging A.I.

Trump. Clinton. Gates. Duke of York. My colleagues and I came up with a list of those terms and others about prominent people, places and events that involved Epstein; we’ve added more every day. Some searches were more topical, seeking details on Epstein’s time in jail and death. The plan was to divide those terms and phrases among the reporters and then begin searching the files to see what we found that was new and potentially newsworthy.

They hightlight the application of A.I. for searching, organising, synthesising, etc. but not for expert judgement. Some interesting excerps:

The first thing we always try to do is make things searchable. But here we also needed ways for reporters to get at the things that weren’t easy targets for search. One way we did that was by leveraging something called “semantic search,” which lets reporters search for concepts and find matching text even if the exact language isn’t in the document. We also built an A.I.-powered tagging and categorization tool to bucket the documents by type and add labels for things that we thought may be useful indicators of newsworthiness [...]

With A.I., information — text, images, video, audio — is like a liquid; it can be molded into different formats and searched in rich, expressive ways. A.I. will never replace the expert judgment of reporters, but it can make their lives easier and amplify their reporting ambitions.

A.I. is really bad at news judgment — what information to include, whether it’s important. A.I. can be sloppy and make mistakes that are inexcusable in journalism. It’s super industrious but not super intelligent. A.I. outputs can amplify biases in society. And in my experience, A.I. is not great at producing original ideas (but decent at synthesizing or distilling them).

The way we use A.I. is quite different than how most people interface with Gemini and other tools. We are writing software that gives discrete tasks to A.I. that we feel comfortable the technology can handle reliably. For example, we may ask it to let us know if a page has an image or if a document is an email. The stuff we get back may help reporters get to the right material faster, but ultimately a reporter’s eyes on actual documents are what is driving every story.

Therefore, sharing interoperable metadata, the results of artificial intelligence tools, and expert reviews and annotations could save a considerable amount of time and resources, as well as improve future research on these materials. This would benefit not only journalists, but also qualitative social science researchers, linguists, historians, lawyers, and many others.

Noemi

--

--
Noemi Betancort Cabrera
Data and systems librarian - Qualiservice metadata manager

Staats- und Universitätsbibliothek Bremen
Digitale Dienste
BibliothekstraĂźe 9
28359 Bremen

Tel. 0421/218-59592
Fax. 0421/218-98 59592

Reply all
Reply to author
Forward
0 new messages