Automatically(?) cleaning up temporary files?

12 views
Skip to first unread message

Johann Petrak

unread,
Apr 17, 2015, 1:51:33 PM4/17/15
to s-spac...@googlegroups.com
S-Space internally often creates temporary files - in our application we see a lot
of files named similar to
  /tmp/matlab-sparse-matrix6734086930884435396.dat.matrix-transform4271308682942766012.dat
or
  /tmp/matlab-sparse-matrix8440308432346724094.dat
get created.
Is it possible the have sspace remove those files automatically, or is there some API method to
clean them? If not, what is the best way to remove them  - how to best obtain all the necessary
paths and at which point to remove them? As we are only using high-level API calls we were
not even aware initally that this is happening and I am not sure how to even find out which
files are created internally by which calls.
In our case, a large number of those files get created over time and the huge number of files
in /tmp can lead to slow-downs and other problems.

Thanks,
  johann

Johann Petrak

unread,
Apr 19, 2015, 11:00:29 AM4/19/15
to s-spac...@googlegroups.com
Here is a short completely self-contained groovy script to illustrate this problem:

@Grab(group='edu.ucla.sspace', module='sspace', version='2.0.4')
@Grab(group='colt', module='colt', version='1.2.0')
import edu.ucla.sspace.common.SemanticSpace;
import edu.ucla.sspace.vsm.VectorSpaceModel;
import java.io.BufferedReader;
import java.io.StringReader;

SemanticSpace semSpace = new VectorSpaceModel();
semSpace.processDocument(new BufferedReader(new StringReader("string")));
Properties config = new Properties();
config.put(semSpace.MATRIX_TRANSFORM_PROPERTY, "edu.ucla.sspace.matrix.NoTransform");
semSpace.processSpace(config);


Is there a way to modify this script so that the matlab-sparse-matrix* file is getting removed automatically
or is there at least a way to find out which file was create so it can get removed by the script?

David Jurgens

unread,
Apr 19, 2015, 4:37:12 PM4/19/15
to s-spac...@googlegroups.com
Hi Johann,

  It looks like the code that was creating these temporary files wasn't correctly marking them for deletion on JVM exit, which as you found out, causes them to pile up in /tmp.  I've gone through the code base and cleaned up all the places that we create such files, so hopefully this behavior is now stopped. The files will still be generated and appear during run time, but should be deleted if the JVM exits normally.  The changes have been pushed to the trunk so please let me know if you see any different behavior.   

  Thanks,  
  David

--

---
You received this message because you are subscribed to the Google Groups "S-Space Package Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to s-space-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Johann Petrak

unread,
Apr 20, 2015, 8:37:30 AM4/20/15
to s-spac...@googlegroups.com
Hi David, thank you so much, this is really helpful!
One more thing I am wondering about now is this: we use the library also in a server
which runs an application for incoming REST requests. Part of that application involves
using the sspace library to essentially do something similar to what I put in the example script.
Now, since the server ideally would run over weeks or months without getting restarted,
the cleanup-on-JVM exit method will not help. So in that context, is there anything we could
do already to still clean up those files? I assume that the time-span during which we can
expect the files to be needed is very short, so in our specific situation, an external
job that cleans all files older than a day could be a workaround.
But ideally, maybe the code should have control and a solution where the client app
could register a callback whenever a file is not needed any more could be preferable?
What are your thoughts on this?

Many thanks,
  johann

--

---
You received this message because you are subscribed to a topic in the Google Groups "S-Space Package Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/s-space-users/Oh6dLwUjrkk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to s-space-user...@googlegroups.com.

Johann Petrak

unread,
Apr 20, 2015, 8:53:40 AM4/20/15
to s-spac...@googlegroups.com
Thank you for making these changes!
I am not sure if I am doing something wrong, but a freshly cloned copy of the
latest version does not compile on my computer. After cloning and changing into the
directory I do
  ./opt/add_non_maven_jars.sh
this seems to work fine, then
  mvn package
or
  mvn compile
either of those aborts with the following errors:

[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /data/johann/develop/java/sspace-github/src/main/java/edu/ucla/sspace/matrix/SVD.java:[28,43] error: cannot find symbol
[ERROR]  package edu.ucla.sspace.matrix.factorization
/data/johann/develop/java/sspace-github/src/main/java/edu/ucla/sspace/matrix/SVD.java:[158,23] error: cannot find symbol
[ERROR]  class SVD
/data/johann/develop/java/sspace-github/src/main/java/edu/ucla/sspace/matrix/SVD.java:[172,34] error: cannot find symbol
[INFO] 3 errors

It seems the referenced class edu.ucla.sspace.matrix.factorization.SingularValueDecompositionColt is indeed not
there any more?

David Jurgens

unread,
Apr 20, 2015, 9:02:56 AM4/20/15
to s-spac...@googlegroups.com
Hi Johann,

  It looks like some of changes to the SVD class were unintentionally included when I fixed the temp file issue.  I've corrected these and pushed the changes to the trunk, so everything should compile now.  Sorry for the confusion!

  Regarding your earlier question, let me see if I can figure something out in the next few days.  I haven't really looked into the lifecycle of the temp files, so I'm not sure where the appropriate points are for notifying that a file is no longer needed.  However, I think something might be able to be worked out using the garbage collection facilities on the File instances themselves. 

  Thanks,
  David

Johann Petrak

unread,
Aug 10, 2015, 6:16:39 AM8/10/15
to S-Space Package Users
Hi David, 

thanks for this - unfortunately I was too busy to get back to this in a constructive way. The fix caused other problems so I had to go back to the version that created all those
temporary files for a while. I have now found some time to give more feedback:

The fix seems to intruduce a new problem: the same code that worked fine previously (but created the temporary files) now throws array out of bounds exceptions.

Here is a somewhat minimal test-case - when you run this with the new jar, it will throw an exception, when you run this with the old jar, it will work fine and create a file in /tmp

//---------- SNIP
import edu.ucla.sspace.common.DocumentVectorBuilder;
import edu.ucla.sspace.common.SemanticSpace;
import edu.ucla.sspace.vector.DoubleVector;
import edu.ucla.sspace.vsm.VectorSpaceModel;
import edu.ucla.sspace.vector.DenseVector;


import java.io.BufferedReader;

SemanticSpace candidateListSemSpace = new VectorSpaceModel();
candidateListSemSpace.processDocument(new BufferedReader(new StringReader("This is some text")));


Properties tficfConfig = new Properties();
tficfConfig.put(VectorSpaceModel.MATRIX_TRANSFORM_PROPERTY,"edu.ucla.sspace.matrix.TfIdfTransform");
candidateListSemSpace.processSpace(tficfConfig);
DocumentVectorBuilder tficfVectorBuilder = new DocumentVectorBuilder(candidateListSemSpace);
DoubleVector tficfContextVector = 
tficfVectorBuilder.buildVector(
  new BufferedReader(new StringReader("This is also some text")),
  new DenseVector(candidateListSemSpace.getVectorLength()));
// ------------------------ SNAP

I have create issue 69 for this. https://github.com/fozziethebeat/S-Space/issues/69

thanks,
  Johann
Reply all
Reply to author
Forward
0 new messages