Dear OGB team,
I am now trying to do some work on OGB datasets, and I really appreciate your great job. Currently I focus on the ogbn-arxiv dataset.
I noticed that you obtain the feature vectors by simply averaging the embedding of each word in the titles and abstracts, and the embeddings are generated by skip-gram. It looks like a simple way to get features which could be inaccurate, and the results of MLP proves it.
I am trying to figure out how the neighbors and a node itself affect the classification process, so I wonder which level can a model achieve when it only leverage the information of each nodes (i.e with no graph info). I am thinking of feeding the titles and abstracts to Bert and classify these papers and seeing what would happen.
Thus, I now need the titles and abstracts of each paper in ogbn-arxiv dataset. After checking the MAG website I found it hard to fetch them, so I write this email to ask if you can help me with this. Besides, any advice on this will be also highly appreciated.