Groups
Conversations
All groups and messages
Send feedback to Google
Help
Sign in
Groups
Web Data Commons
Conversations
About
Web Data Commons
1–30 of 128
Mark all as read
Report abusive group
0 selected
Shawn Lucas
,
Sebastian Nagel
3
Aug 4
WWW Ranking - Update
Thanks Sebastian! Super helpful, really appreciate it. I'll take a look at both links, play with
unread,
WWW Ranking - Update
Thanks Sebastian! Super helpful, really appreciate it. I'll take a look at both links, play with
Aug 4
kenn tatami
Jun 18
Feedback: Can you provide a code/pseudo code for pair selection algorithm?
Hi all, As a newcomer to the field, I have some feedback to the document: "WDC Product Data
unread,
Feedback: Can you provide a code/pseudo code for pair selection algorithm?
Hi all, As a newcomer to the field, I have some feedback to the document: "WDC Product Data
Jun 18
Fantin Charles
,
Anna Primpeli
9
Jun 3
Question about amazon ec2 instances that doesn't "work"
Hi Anna, After verification my S3 bucket is configured to be public. I ran the complete startup code.
unread,
Question about amazon ec2 instances that doesn't "work"
Hi Anna, After verification my S3 bucket is configured to be public. I ran the complete startup code.
Jun 3
Gan Gao
,
Chris Bizer
3
May 25
Question about number of product categories
Thanks so much Chris! Do you know if we have mapping between the Goldstandard 222 to Google product
unread,
Question about number of product categories
Thanks so much Chris! Do you know if we have mapping between the Goldstandard 222 to Google product
May 25
ottowg...@gmail.com
,
apri...@gmail.com
2
May 16
Type specific subsets Format Selection is missing in 2021
Dear Wolf, Thank you for your message! In which specific json schema.org subsets would you be
unread,
Type specific subsets Format Selection is missing in 2021
Dear Wolf, Thank you for your message! In which specific json schema.org subsets would you be
May 16
Lewis John Mcgibbney
Mar 4
CVE-2022-25312: An XML external entity (XXE) injection vulnerability exists in the Apache Any23 RDFa XSLTStylesheet extractor
Description: An XML external entity (XXE) injection vulnerability was discovered in the Any23 RDFa
unread,
CVE-2022-25312: An XML external entity (XXE) injection vulnerability exists in the Apache Any23 RDFa XSLTStylesheet extractor
Description: An XML external entity (XXE) injection vulnerability was discovered in the Any23 RDFa
Mar 4
Lewis John Mcgibbney
Mar 4
[ANNOUNCE] Apache Any23 2.7
The Apache Any23 Project Management Committee is pleased to announce the release of Apache Any23 2.7.
unread,
[ANNOUNCE] Apache Any23 2.7
The Apache Any23 Project Management Committee is pleased to announce the release of Apache Any23 2.7.
Mar 4
Lewis John Mcgibbney
Mar 3
Website up-to-date?
Hi WDC Team, I was recently showing WDC to colleagues during a presentation. I noticed that the
unread,
Website up-to-date?
Hi WDC Team, I was recently showing WDC to colleagues during a presentation. I noticed that the
Mar 3
Anna Primpeli
,
Kelly Murphy
3
Jan 31
[ANN] WDC Product Data Corpus V.2020 released
Hello Kelly, Thank you for your e-mail and interest in the corpus! Unfortunately, we haven't
unread,
[ANN] WDC Product Data Corpus V.2020 released
Hello Kelly, Thank you for your e-mail and interest in the corpus! Unfortunately, we haven't
Jan 31
apri...@gmail.com
,
Lewis John Mcgibbney
2
Jan 12
[ANN] WebDataCommons releases 82.1 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.6 million websites
Congratulations Anna, Alexander and Chris on this release. The analysis is very interesting indeed.
unread,
[ANN] WebDataCommons releases 82.1 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 14.6 million websites
Congratulations Anna, Alexander and Chris on this release. The analysis is very interesting indeed.
Jan 12
Lewis John Mcgibbney
Jan 10
[ANNOUNCE] Apache Any23 2.6 Release
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.6. Apache Anything To
unread,
[ANNOUNCE] Apache Any23 2.6 Release
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.6. Apache Anything To
Jan 10
Arindam Mitra
10/11/21
web data for medical types
Hi, Does the "WDC RDFa, Microdata, Embedded JSON-LD, and Microformats Data Sets" contain
unread,
web data for medical types
Hi, Does the "WDC RDFa, Microdata, Embedded JSON-LD, and Microformats Data Sets" contain
10/11/21
Matt
9/20/21
Question about web data commons and prefixes
Do the results include meta tags with property og:title, og:url, etc. where the og: prefix has not
unread,
Question about web data commons and prefixes
Do the results include meta tags with property og:title, og:url, etc. where the og: prefix has not
9/20/21
Lewis John Mcgibbney
9/10/21
CVE-2021-40146: A Remote Code Execution (RCE) vulnerability exists in Apache Any23 YAMLExtractor.java
Description: A Remote Code Execution (RCE) vulnerability was discovered in the Any23 YAMLExtractor.
unread,
CVE-2021-40146: A Remote Code Execution (RCE) vulnerability exists in Apache Any23 YAMLExtractor.java
Description: A Remote Code Execution (RCE) vulnerability was discovered in the Any23 YAMLExtractor.
9/10/21
Lewis John Mcgibbney
9/10/21
CVE-2021-38555: An XML external entity (XXE) injection vulnerability exists in Apache Any23 StreamUtils.java
Severity: critical Description: An XML external entity (XXE) injection vulnerability was discovered
unread,
CVE-2021-38555: An XML external entity (XXE) injection vulnerability exists in Apache Any23 StreamUtils.java
Severity: critical Description: An XML external entity (XXE) injection vulnerability was discovered
9/10/21
Lewis John Mcgibbney
9/10/21
[ANNOUNCE] Apache Any23 2.5 Release
What? The Apache Any23 Team is pleased to announce the release of Apache Any23 2.5. Apache Anything
unread,
[ANNOUNCE] Apache Any23 2.5 Release
What? The Apache Any23 Team is pleased to announce the release of Apache Any23 2.5. Apache Anything
9/10/21
Paul McCarthy
9/1/21
[ Hyperlink Graph 2012] Error downloading Katz Ranking.gz
Dear Web Data Commons Group, Thanks for making this amazing resource available to the research
unread,
[ Hyperlink Graph 2012] Error downloading Katz Ranking.gz
Dear Web Data Commons Group, Thanks for making this amazing resource available to the research
9/1/21
Michal Turski
7/15/21
[Web Table Corpus 2015] Header/key column in vertical tables
Hello I have problem with interpretation this json describing some table in Web Table Corpus 2015: ``
unread,
[Web Table Corpus 2015] Header/key column in vertical tables
Hello I have problem with interpretation this json describing some table in Web Table Corpus 2015: ``
7/15/21
Wing Wong
, …
rall...@googlemail.com
5
5/21/21
problem on loading the dataset
Hi Wing, Is it really necessary for you to load this corpus file? All of the training/validation/test
unread,
problem on loading the dataset
Hi Wing, Is it really necessary for you to load this corpus file? All of the training/validation/test
5/21/21
Julian Takehana Toya Angeles
5/17/21
[wdc12] Problem downloading one file
Hey there, I've been having troubling downloading one of the smaller files. It always gets to 82%
unread,
[wdc12] Problem downloading one file
Hey there, I've been having troubling downloading one of the smaller files. It always gets to 82%
5/17/21
rall...@googlemail.com
3/29/21
[ANNOUNCEMENT] WDC Schema.org Table Corpus released
Hi all, we are happy to announce the release of the Web Data Commons Schema.org Table corpus. The
unread,
[ANNOUNCEMENT] WDC Schema.org Table Corpus released
Hi all, we are happy to announce the release of the Web Data Commons Schema.org Table corpus. The
3/29/21
Lewis John Mcgibbney
3/9/21
RDF / Linked data track @ ApacheCon
Hi Folks, I would like to encourage you to submit content to the RDF / LInked data track @ApacheCon.
unread,
RDF / Linked data track @ ApacheCon
Hi Folks, I would like to encourage you to submit content to the RDF / LInked data track @ApacheCon.
3/9/21
Mubashara Akhtar
,
Anna Primpeli
3
3/2/21
Web Table Corpora
Many thanks for the quick response! Best regards, Mubashara From: web-data-commons@googlegroups.com
unread,
Web Table Corpora
Many thanks for the quick response! Best regards, Mubashara From: web-data-commons@googlegroups.com
3/2/21
apri...@gmail.com
, …
Zhichao Han
5
2/26/21
WebDataCommons releases 86.3 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 15.3 million websites
Hello, as our datasets are extracted from the Common Crawl web corpus which contains pages from
unread,
WebDataCommons releases 86.3 billion quads Microdata, Embedded JSON-LD, RDFa, and Microformat data originating from 15.3 million websites
Hello, as our datasets are extracted from the Common Crawl web corpus which contains pages from
2/26/21
Lewis John Mcgibbney
10/6/20
[ANNOUNCEMENT] Apache Any23 2.4 Release
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.4. Apache Anything To
unread,
[ANNOUNCEMENT] Apache Any23 2.4 Release
The Apache Any23 Team is pleased to announce the release of Apache Any23 2.4. Apache Anything To
10/6/20
Z Z
7/13/20
Call for Participation SWC-MWPD 2020 (ROUND 1 RESULTS AND ROUND 2 CfP) Semantic Web Challenge on Mining the Web of HTML-embedded Product Data (@ISWC2020)
ROUND 2 Call for Participation: Mining the Web of HTML-embedded Product Data (co-located with
unread,
Call for Participation SWC-MWPD 2020 (ROUND 1 RESULTS AND ROUND 2 CfP) Semantic Web Challenge on Mining the Web of HTML-embedded Product Data (@ISWC2020)
ROUND 2 Call for Participation: Mining the Web of HTML-embedded Product Data (co-located with
7/13/20
Егор Еремеев
,
Ralph Peeters
3
7/11/20
How to enrich the WDC Product Data Corpus with product images data?
Hello, Ralph, Thank you for explanation and suggestion of possible approach. They make things clearly
unread,
How to enrich the WDC Product Data Corpus with product images data?
Hello, Ralph, Thank you for explanation and suggestion of possible approach. They make things clearly
7/11/20
Neiman Tal
,
Ralph Peeters
6
5/4/20
Reproducing the Baseline experiments on the Gold Standards
Yes, exactly. On Friday, April 24, 2020 at 5:34:23 AM UTC-4, Ralph Peeters wrote: Hi Tal, What
unread,
Reproducing the Baseline experiments on the Gold Standards
Yes, exactly. On Friday, April 24, 2020 at 5:34:23 AM UTC-4, Ralph Peeters wrote: Hi Tal, What
5/4/20
Ziqi Zhang
4/24/20
2nd CfP: Semantic Web Challenge (ISWC2020) - Mining the Web of HTML-embedded Product Data
2nd Call for Participation: Mining the Web of HTML-embedded Product Data (co-located with ISWC2020) (
unread,
2nd CfP: Semantic Web Challenge (ISWC2020) - Mining the Web of HTML-embedded Product Data
2nd Call for Participation: Mining the Web of HTML-embedded Product Data (co-located with ISWC2020) (
4/24/20
Karim Ratib
,
Anna Primpeli
3
4/3/20
Schema.org ClaimReview
Thanks Anna - I understand the statistics are useful to determine prioritization on your side. We
unread,
Schema.org ClaimReview
Thanks Anna - I understand the statistics are useful to determine prioritization on your side. We
4/3/20