Exception in thread "main" java.io.IOError: java.io.StreamCorruptedException: invalid stream header: 4D616769

43 views
Skip to first unread message

Omar El-Begawy

unread,
Jul 15, 2015, 9:41:27 AM7/15/15
to s-spac...@googlegroups.com

Hi,

I am trying to learn how to use the S-Space package. However, I get the following error when I try to read a text file in the UTF-8 format:

-------------------------------------

Exception in thread "main" java.io.IOError: java.io.StreamCorruptedException: invalid stream header: 4D616769
    at edu.ucla.sspace.util.SerializableUtil.load(SerializableUtil.java:165)
    at edu.ucla.sspace.common.SemanticSpaceIO.loadInternal(SemanticSpaceIO.java:271)
    at edu.ucla.sspace.common.SemanticSpaceIO.load(SemanticSpaceIO.java:225)
    at edu.ucla.sspace.common.SemanticSpaceIO.load(SemanticSpaceIO.java:186)
    at randomIndexing.main(randomIndexing.java:29)
Caused by: java.io.StreamCorruptedException: invalid stream header: 4D616769
    at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:806)
    at java.io.ObjectInputStream.<init>(ObjectInputStream.java:299)
    at edu.ucla.sspace.util.SerializableUtil.load(SerializableUtil.java:159)

-------------------------------------


I have a program which loads the file into a semantic space, but as soon as it does this, the error appears. Is there a workaround for this or how should I handle the error?

Best regards,
Omar El-Begawy
Åbo Akademi University

David Jurgens

unread,
Jul 15, 2015, 9:45:27 AM7/15/15
to s-spac...@googlegroups.com
Hi Omar,

  It looks like the exception is being thrown when trying to deserialize a store SemanticSpace object using Java deserialization.  Do you know what format the file was saved in originally?   (Could the SemanticSpace have been saved in text or sparse text format?)

  Thanks,
  David

--

---
You received this message because you are subscribed to the Google Groups "S-Space Package Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Omar El-Begawy

unread,
Jul 15, 2015, 10:08:58 AM7/15/15
to s-spac...@googlegroups.com

Hi,

Here is a sample from the file (it is in Swedish):

----------------------
Hollywoods långa tradition av att flyga in regissörer från Europa har fostrat praktakter.
Det är jättekul med dinosaurier. Stora, små, flygande och simmande dinosaurier.
----------------------

It is a single file in which each line contains a complete review.

I have the following program (minus the necessary imports):


public class randomIndexing{
    public static void main(String [] args) throws IOException{

    String inputfile = "samples2";
    SemanticSpace sspace = SemanticSpaceIO.load(inputfile);

    }
}

I think it should be correct, since I did it according to the example in the sample use case, but maybe there is some detail I am missing.

Br,
Omar

David Jurgens

unread,
Jul 15, 2015, 1:12:51 PM7/15/15
to s-spac...@googlegroups.com
Hi Omar,

  I think I know what's going on. :)  The SemanticSpaceIO.load() method is designed to be used on a saved/serialized version of a SemanticSpace.  However, you're passing in a file with the raw corpus, which the loading functionality isn't able to process (as expected).

  To get a SemanticSpace, you'll first need to process the corpus using one of the implementations (e.g., RandomIndexing, LatentSemanticAnalysis, etc.).  The processing will depend on how you want to separate your corpus into documents, but you will need to call the processDocument() method on the SemanticSpace instance and then once you've finished passing it all the documents, call processSpace().

  There should be more detailed instructions on the wiki for doing each of these steps but let me know if any of this is unclear and I can give more detailed instructions.


  Thanks,
  David

Omar El-Begawy

unread,
Jul 16, 2015, 9:25:34 AM7/16/15
to s-spac...@googlegroups.com

Thank you David for the help. I did as you told and processed the corpus through one of implementations, and it works! The working code is below. I'll start doing some further stuff with this package now.

Br, Omar


-----------------------------------------

public class randomIndexing{
    public static void main(String [] args) throws IOException{

           String inputfile = "nojesguiden_samples2";
           OneLinePerDocumentIterator odocit = new OneLinePerDocumentIterator(inputfile);

           RandomIndexing ri = new RandomIndexing();
           System.out.println(ri.getSpaceName());
           int numDims = ri.getVectorLength();
           System.out.println(ri.getVectorLength());

           // iterate through the documents in the file
           while(odocit.hasNext()){

              Document d = odocit.next();       
               BufferedReader br = d.reader();
              ri.processDocument(br);
               br.close();

          }

           // print out some contents from the semantic space
           System.out.println("Hollywoods " + ri.getVector("Hollywoods"));
           Set<String> set = ri.getWords();
           for(String s: set){
               System.out.print(s + " ");
           }
           System.out.println();

           // ??
           Properties p = new Properties();
           ri.processSpace(p);

           // write the semantic space to a file...
           String filename = "ri.sspace";
           File riFile = new File(filename);
          SemanticSpaceIO.save(ri, riFile, SSpaceFormat.SPARSE_TEXT);

           // .. and read it back again
           SemanticSpace sspace = SemanticSpaceIO.load(filename);

           // simple consistency check
           int numDims2 = sspace.getVectorLength();
           System.out.println("dim1 " + numDims + ", dim2 " + numDims2);

    }
}


-----------------------------------------
Reply all
Reply to author
Forward
0 new messages