Request: Schema.org NewsArticle Subset

8 views
Skip to first unread message

Marwah Sulaiman

unread,
Apr 21, 2026, 6:19:01 AM (9 days ago) Apr 21
to Web Data Commons

Dear WebDataCommons Team,

I came across your Schema.org subsets page (2024 release), where you suggest contacting you for classes not listed. I’m currently conducting research related to fact-checking and semantic annotations, where access to the NewsArticle data would be very helpful.

Would it be possible to access this subset, or could you advise on how to obtain it?

Thank you for your time.

Best regards,
Marwah Sulaiman

Chris Bizer

unread,
Apr 24, 2026, 7:34:54 AM (6 days ago) Apr 24
to Web Data Commons
Dear Marwah,

unfortunately, we did not extract the NewsArticle data when doing the 2024 release. We also currently do not have the capacity to extract and publish additional subsets.

For extracting the data yourself, you would need to do the following:
1. download the JSON-LD and Microdata parts of the 2024 release from https://webdatacommons.org/structureddata/2024-12/stats/how_to_get_the_data.html 
2. Run the subset creation Jupyter notebook found at https://github.com/wbsg-uni-mannheim/SubsetCreatorJupyterNBs for the subsets you are interested in.

Best regards,
Chris
Reply all
Reply to author
Forward
0 new messages