Inquiry Regarding Statistics on the Latest OAG Dataset

14 views
Skip to first unread message

Zhang Shiqi

unread,
May 31, 2024, 5:40:46 AM5/31/24
to fan...@tsinghua.edu.cn, open-acad...@googlegroups.com
Dear Fanjin,

I hope this message finds you well. This is Shiqi from National University of Singapore.

I am writing to inquire about some specific details regarding the latest release of the OAG dataset. Specifically, I am interested in obtaining statistics on the following aspects:

- Subjects Covered: Could you provide a breakdown of how many subject categories are included in the latest dataset? Additionally, I would like to know the taxonomy or classification system used for these subjects.

- Recency of Papers: How current are the papers included in this dataset? Does the dataset cover all papers up until February 2024, or is there a different cut-off date?

Your assistance in providing these details would be greatly appreciated as it will significantly aid in our research and analysis.

Thank you for your time and support.

Best regards,
Shiqi

zfj...@gmail.com

unread,
May 31, 2024, 6:03:31 AM5/31/24
to Open Academic Graph
Hi Shiqi,

Thanks for your question!

1. The latest release doesn't cover separate subject categories. The paper data contains the "keywords" field extracted from the paper texts.

2. The dataset covers papers until February 2024.

Shiqi Zhang

unread,
Jun 2, 2024, 3:03:24 AM6/2/24
to Open Academic Graph
Hi Fanjin,

Thanks for your answer.

I found a concept taxonomy mentioned in your latest work OAG-Bench.
May I ask if this taxonomy is also open-sourced?

Best,
Shiqi
Reply all
Reply to author
Forward
0 new messages