File format and parsing

42 views
Skip to first unread message

wl...@infor.ecnu.edu.cn

unread,
Jul 16, 2020, 11:55:38 AM7/16/20
to Open Academic Graph
Hi everyone,
I'm a beginner of data analysis. I'm a bit confused that the OAG V2 I downloaded looks like .json files to me but the format for each author/paper is not aligned. I mean some items have tags/orgs, some don't. So when I use json parsing package to read them, all the programs throw errors back.
But on the other hand, the mag files are all .txt. So can some one help me how to parse the data into database? Preferred using R.
I remember that you have shared your code in GitHub but couldn't find it anymore.
Thanks so much in advance!

Fanjin Zhang

unread,
Jul 17, 2020, 12:04:04 AM7/17/20
to Open Academic Graph
Hi, you can load each author/paper as a dictionary and then check available fields. For example, you can use Python as the following,

import json

with open("aminer_authors_0.zip") as rf:
    for i, line in enumerate(rf):
        cur_author = json.loads(line)
        orgs = cur_author.get("orgs", [])
Reply all
Reply to author
Forward
0 new messages