Welcome message

14 views
Skip to first unread message

Shivam Sharma

unread,
Jun 24, 2025, 8:13:29 AMJun 24
to WHALE Embeddings

Hi all,

We are excited to welcome you to the WHALE-Embeddings Google Group!

WHALE is a new large-scale resource of knowledge graph embeddings generated from the Web Data Commons dataset — currently the largest collection of structured data extracted from the public web. The dataset encompasses nearly 98 billion RDF triples extracted from over 22 million domains, presenting a significant scalability challenge for knowledge graph embedding algorithms.

To address this challenge, we employed DeCal , a state-of-the-art knowledge graph embedding model, to generate high-quality embeddings over this unprecedentedly large dataset. The resulting embeddings — dubbed WHALE-embeddings — are publicly available at:

👉 https://embeddings.cc/

👉 https://files.dice-research.org/datasets/WHALE/WDC/embeddings/

WHALE-embeddings aim to contribute to the community by:

  • Providing the largest publicly available knowledge graph embedding resource to date.

  • Enabling scalable experimentation and benchmarking for downstream tasks such as entity linking, clustering, and search.

  • Facilitating research on large-scale embedding algorithms, representation learning, and web-scale knowledge graphs without requiring users to process massive raw datasets themselves.

  • Supporting further study of structural patterns and deployment trends of web-extracted RDF data.

The WHALE-embeddings project is part of an ongoing research effort led by the DICE research group at the University of Paderborn.

We would like to thank:

  • Web Data Commons for making their large-scale web extractions publicly available.

  • The broader open-source community whose tools and datasets made this work possible.

We plan to regularly update WHALE-embeddings as new data becomes available, and we look forward to discussions, feedback, and collaboration within this group. Feel free to introduce yourself, share ideas, ask questions, or suggest directions for future work!

Warm regards,
The WHALE-Embeddings Team

Reply all
Reply to author
Forward
0 new messages