Hi all,
I am trying to get the cooccurence values for word pairs by reading the binary cooccurence.bin file using python. This file gets produced in the third step, as a result of running the coocur program.
Has anyone tried this? It seems like there are three values for each pair plus an index:
typedef struct cooccur_rec_id {
int word1;
int word2;
real val;
int id;
When writing, though, I see three values
- index of the first word (integer - 4 bytes)
- index of the second word (integer - 4 bytes)
- cooccurence (real - 4 bytes? 8 bytes?)
This is what I inferred by looking at the cooccur program.
I can't seem to get the right number of bytes to read. It looks like reading by 16 bytes at a time, I can get the integers correctly but the coocurance value doesn't make sense.
Anyone tried this? Any help would be appreciated
Murat