It's our honor to introduce you to our new work
OAG-BERT based on Open Academic Graph (OAG). The recent development of pre-trained language models such as BERT is a revolution in natural language processing. Domain language models have also experienced a surge. There have been some BERTs pre-trained on scientific corpora such as SciBERT and BioBERT.
However, we believe that entity knowledge such as papers, authors, fields-of-study, venues, and affiliations are essential for academic mining and analysis. In light of these problems, we present the OAG-BERT, which integrates billions of entities and relationships in OAG during the BERT pre-training. We show that OAG-BERT significantly outperforms other competitors in entity-related benchmarks. We discuss more details in our
paper.
The OAG-BERT is now accessible via the python CogDL package. We welcome you to join our
Slack group for discussion and eagerly look forward to your feedback and requests.