Groups
Sign in
Groups
Web Data Commons
Conversations
About
Send feedback
Help
Web Data Commons
Contact owners and managers
1–30 of 141
Mark all as read
Report group
0 selected
Chillar Anand
Oct 19
WWW Ranking - API
Hi, Which crawl did you use for WWW Ranking? Is it updated for every crawl? Is there a git repo for
unread,
WWW Ranking - API
Hi, Which crawl did you use for WWW Ranking? Is it updated for every crawl? Is there a git repo for
Oct 19
M Martorana
Oct 2
Information about T2D-SM-WH
Hello there, I have some questions about the T2D-SM-WH dataset that I was hoping you could help me
unread,
Information about T2D-SM-WH
Hello there, I have some questions about the T2D-SM-WH dataset that I was hoping you could help me
Oct 2
Alexander Brinkmann
,
Dan Brickley
5
Sep 9
[ANN] WebDataCommons releases 86.4 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.2 million websites
Hello! So sorry for the late reply - crazy year. Ok, year and a half … Dare I ask for more? Sure, why
unread,
[ANN] WebDataCommons releases 86.4 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.2 million websites
Hello! So sorry for the late reply - crazy year. Ok, year and a half … Dare I ask for more? Sure, why
Sep 9
Shivam Sharma
,
Chris Bizer
2
Jun 12
Download Domain specific datasets
Hello, the WDC schema.org table corpus arranges the data per domain, eg it consists of tables
unread,
Download Domain specific datasets
Hello, the WDC schema.org table corpus arranges the data per domain, eg it consists of tables
Jun 12
Shivam Sharma
Mar 28
Extraction error in jsonld format file from October 2023 Common Crawl Corpus
Hi all, I am facing an extraction error for this file. When using the `gzip` command on Linux to
unread,
Extraction error in jsonld format file from October 2023 Common Crawl Corpus
Hi all, I am facing an extraction error for this file. When using the `gzip` command on Linux to
Mar 28
Alexander Brinkmann
Feb 6
[ANN] WebDataCommons releases 97.7 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.6 million websites
Hi all, we are happy to announce the new release of the WebDataCommons Microdata, JSON-LD, RDFa and
unread,
[ANN] WebDataCommons releases 97.7 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.6 million websites
Hi all, we are happy to announce the new release of the WebDataCommons Microdata, JSON-LD, RDFa and
Feb 6
Alexander Brinkmann
5/1/23
Web Data Commons presented at the ACM Web Conference (WWW2023) in Austin, Tx, USA
Hello, this is a message to subscribers of this list who attend the WWW2023 conference in Austin, Tx,
unread,
Web Data Commons presented at the ACM Web Conference (WWW2023) in Austin, Tx, USA
Hello, this is a message to subscribers of this list who attend the WWW2023 conference in Austin, Tx,
5/1/23
Yuting Zhao
,
Ziqi Zhang
3
2/14/23
Request for Rakuten SIGIR2018 dataset
Thanks for your quick feedback! Yes, I have emailed Yiu-Chang Lin directly, however, this email
unread,
Request for Rakuten SIGIR2018 dataset
Thanks for your quick feedback! Yes, I have emailed Yiu-Chang Lin directly, however, this email
2/14/23
rall...@googlemail.com
1/26/23
[ANN] WDC Products: Multi-Dimensional Entity Matching Benchmark released
Hi all, We are happy to announce the release of the multi-dimensional WDC Products Benchmark for
unread,
[ANN] WDC Products: Multi-Dimensional Entity Matching Benchmark released
Hi all, We are happy to announce the release of the multi-dimensional WDC Products Benchmark for
1/26/23
Vladimir Alexiev
,
rall...@googlemail.com
6
12/22/22
wrong 2021-12/stats/schema_org_subsets
The problem should be fixed across all WDC pages now and downloads should work on every modern
unread,
wrong 2021-12/stats/schema_org_subsets
The problem should be fixed across all WDC pages now and downloads should work on every modern
12/22/22
kenn tatami
,
rall...@googlemail.com
2
11/21/22
Feedback: Can you provide a code/pseudo code for pair selection algorithm?
Hi Tatsuhiko, Thank you for your valuable feedback and I am sorry about missing your post until now.
unread,
Feedback: Can you provide a code/pseudo code for pair selection algorithm?
Hi Tatsuhiko, Thank you for your valuable feedback and I am sorry about missing your post until now.
11/21/22
Ahmad Alobaid
11/16/22
Typo in the SOTAB webpage
Hi, I just wanted to report a typo in the table. CTA_sample_labels should be CPA_sample_labels. https
unread,
Typo in the SOTAB webpage
Hi, I just wanted to report a typo in the table. CTA_sample_labels should be CPA_sample_labels. https
11/16/22
Ratan Sebastian
,
Alexander Brinkmann
3
11/10/22
Rate/Connection Limits
Good to know. Thanks Alexander. On Thursday, November 10, 2022 at 11:48:20 AM UTC+1 alexander....@
unread,
Rate/Connection Limits
Good to know. Thanks Alexander. On Thursday, November 10, 2022 at 11:48:20 AM UTC+1 alexander....@
11/10/22
Shawn Lucas
, …
Chillar Anand
4
11/8/22
WWW Ranking - Update
All the posts related to WWW-Ranking are tagged in https://commoncrawl.org/category/web-graph/ If
unread,
WWW Ranking - Update
All the posts related to WWW-Ranking are tagged in https://commoncrawl.org/category/web-graph/ If
11/8/22
Kiara Grouwstra
,
Robert Meusel
2
10/27/22
Schema.org ClaimReview and StructuredDataProfiler
Hi Kiara, Looks like the Links is wrong. Can you check this one out: https://github.com/wbsg-uni-
unread,
Schema.org ClaimReview and StructuredDataProfiler
Hi Kiara, Looks like the Links is wrong. Can you check this one out: https://github.com/wbsg-uni-
10/27/22
Fantin Charles
,
Anna Primpeli
9
6/3/22
Question about amazon ec2 instances that doesn't "work"
Hi Anna, After verification my S3 bucket is configured to be public. I ran the complete startup code.
unread,
Question about amazon ec2 instances that doesn't "work"
Hi Anna, After verification my S3 bucket is configured to be public. I ran the complete startup code.
6/3/22
Gan Gao
,
Chris Bizer
3
5/25/22
Question about number of product categories
Thanks so much Chris! Do you know if we have mapping between the Goldstandard 222 to Google product
unread,
Question about number of product categories
Thanks so much Chris! Do you know if we have mapping between the Goldstandard 222 to Google product
5/25/22
ottowg...@gmail.com
,
apri...@gmail.com
2
5/16/22
Type specific subsets Format Selection is missing in 2021
Dear Wolf, Thank you for your message! In which specific json schema.org subsets would you be
unread,
Type specific subsets Format Selection is missing in 2021
Dear Wolf, Thank you for your message! In which specific json schema.org subsets would you be
5/16/22
Lewis John Mcgibbney
3/4/22
CVE-2022-25312: An XML external entity (XXE) injection vulnerability exists in the Apache Any23 RDFa XSLTStylesheet extractor
Description: An XML external entity (XXE) injection vulnerability was discovered in the Any23 RDFa
unread,
CVE-2022-25312: An XML external entity (XXE) injection vulnerability exists in the Apache Any23 RDFa XSLTStylesheet extractor
Description: An XML external entity (XXE) injection vulnerability was discovered in the Any23 RDFa
3/4/22
Lewis John Mcgibbney
3/4/22
[ANNOUNCE] Apache Any23 2.7
The Apache Any23 Project Management Committee is pleased to announce the release of Apache Any23 2.7.
unread,
[ANNOUNCE] Apache Any23 2.7
The Apache Any23 Project Management Committee is pleased to announce the release of Apache Any23 2.7.
3/4/22
Lewis John Mcgibbney
3/3/22
Website up-to-date?
Hi WDC Team, I was recently showing WDC to colleagues during a presentation. I noticed that the
unread,
Website up-to-date?
Hi WDC Team, I was recently showing WDC to colleagues during a presentation. I noticed that the
3/3/22
Anna Primpeli
,
Kelly Murphy
3
1/31/22
[ANN] WDC Product Data Corpus V.2020 released
Hello Kelly, Thank you for your e-mail and interest in the corpus! Unfortunately, we haven't
unread,
[ANN] WDC Product Data Corpus V.2020 released
Hello Kelly, Thank you for your e-mail and interest in the corpus! Unfortunately, we haven't
1/31/22
apri...@gmail.com
,
Lewis John Mcgibbney
2
1/12/22
[ANN] WebDataCommons releases 82.1 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.6 million websites
Congratulations Anna, Alexander and Chris on this release. The analysis is very interesting indeed.
unread,
[ANN] WebDataCommons releases 82.1 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.6 million websites
Congratulations Anna, Alexander and Chris on this release. The analysis is very interesting indeed.
1/12/22
Lewis John Mcgibbney
1/10/22
[ANNOUNCE] Apache Any23 2.6 Release
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.6. Apache Anything To
unread,
[ANNOUNCE] Apache Any23 2.6 Release
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.6. Apache Anything To
1/10/22
Arindam Mitra
10/11/21
web data for medical types
Hi, Does the "WDC RDFa, Microdata, Embedded JSON-LD, and Microformats Data Sets" contain
unread,
web data for medical types
Hi, Does the "WDC RDFa, Microdata, Embedded JSON-LD, and Microformats Data Sets" contain
10/11/21
Matt
9/20/21
Question about web data commons and prefixes
Do the results include meta tags with property og:title, og:url, etc. where the og: prefix has not
unread,
Question about web data commons and prefixes
Do the results include meta tags with property og:title, og:url, etc. where the og: prefix has not
9/20/21
Lewis John Mcgibbney
9/10/21
CVE-2021-40146: A Remote Code Execution (RCE) vulnerability exists in Apache Any23 YAMLExtractor.java
Description: A Remote Code Execution (RCE) vulnerability was discovered in the Any23 YAMLExtractor.
unread,
CVE-2021-40146: A Remote Code Execution (RCE) vulnerability exists in Apache Any23 YAMLExtractor.java
Description: A Remote Code Execution (RCE) vulnerability was discovered in the Any23 YAMLExtractor.
9/10/21
Lewis John Mcgibbney
9/10/21
CVE-2021-38555: An XML external entity (XXE) injection vulnerability exists in Apache Any23 StreamUtils.java
Severity: critical Description: An XML external entity (XXE) injection vulnerability was discovered
unread,
CVE-2021-38555: An XML external entity (XXE) injection vulnerability exists in Apache Any23 StreamUtils.java
Severity: critical Description: An XML external entity (XXE) injection vulnerability was discovered
9/10/21
Lewis John Mcgibbney
9/10/21
[ANNOUNCE] Apache Any23 2.5 Release
What? The Apache Any23 Team is pleased to announce the release of Apache Any23 2.5. Apache Anything
unread,
[ANNOUNCE] Apache Any23 2.5 Release
What? The Apache Any23 Team is pleased to announce the release of Apache Any23 2.5. Apache Anything
9/10/21
Paul McCarthy
9/1/21
[ Hyperlink Graph 2012] Error downloading Katz Ranking.gz
Dear Web Data Commons Group, Thanks for making this amazing resource available to the research
unread,
[ Hyperlink Graph 2012] Error downloading Katz Ranking.gz
Dear Web Data Commons Group, Thanks for making this amazing resource available to the research
9/1/21