When I download a generic document, it has the format filename.sc.xz.gpg
:
-
gpg
: the file was encrypted and so I decrypted it with the apposite key;
-
xz
: the file was compressed and so I decompressed it;
-
sc
: the file was serialized and I MUST deserialize it.
The problem is with the process of deserialization. In the documentation I read that the data was serialized with thrift.
My question is: how can I use thrift to deserialize my file filename.sc
?
I want to take just the content of the news, forums, etc. and maybe save it to a text file. In an old post (called "decrypting the corpus") I read that recommend local classes in java and python. I prefer to use java. Can you help me please?
Thanks John!!
Matteo