Data split

36 views
Skip to first unread message

Oscar Deniz

unread,
May 29, 2015, 7:55:45 AM5/29/15
to express...@googlegroups.com
Hi All,

This Expresso application is awesome. I have one doubt: does anyone know if the data split (training and testing) produces splits with the same number of images per class in the training set?

Oscar

Jaley Dholakiya

unread,
May 29, 2015, 8:29:00 AM5/29/15
to express...@googlegroups.com
Right now it is simply splitting, however you can very easily modify so, by changing the following file :
$EXPRESSO_ROOT/src/data/Splitter.py
In my explaination, I am assuming some basic familiarity with python.
In Splitter.py  variable "dataName" is the name of hdf5 file at location $EXPRESSO_ROOT/data.Variable "value" is the split index. Program splits data( referred by "dataName") at "value" index and saves the split data at same folder location with appropriate suffix.

Data internally is stored as hdf5 compressed format and is a dictionary with two keys : "data" and "label". Both blobs are 4 dimensional.  You can simply modify the split according to your wish locally and the same will be reflected in Expresso. If you are not familiar with hdf5 format, you can also look into python's h5py library, for modifying.
Let me know if you find it difficult to change.
Reply all
Reply to author
Forward
0 new messages