cooccurence.bin file

60 views
Skip to first unread message

Murat Aydogdu

unread,
Sep 26, 2023, 11:01:36 AM9/26/23
to GloVe: Global Vectors for Word Representation
Hi all,
I am trying to get the cooccurence values for word pairs by reading the binary cooccurence.bin file using python. This file gets produced in the third step, as a result of running the coocur program.

Has anyone tried this? It seems like there are three values for each pair plus an index:
typedef struct cooccur_rec_id {
    int word1;
    int word2;
    real val;
    int id;


When writing, though, I see three values
- index of the first word (integer - 4 bytes)
- index of the second word (integer - 4 bytes)
- cooccurence (real - 4 bytes? 8 bytes?)

This is what I inferred by looking at the cooccur program.
I can't seem to get the right number of bytes to read. It looks like reading by 16 bytes at a time, I can get the integers correctly but the coocurance value doesn't make sense.

Anyone tried this? Any help would be appreciated
Murat
Reply all
Reply to author
Forward
0 new messages