Client ID for UBM training?

Prasanna Kothalkar

unread,

Jul 28, 2017, 5:59:26 PM7/28/17

to bob-devel

Hello,

I am trying to use joint factor analysis in spear for my own dataset and thus created my own config file for the database and specifying the file in verify command. I have unlabelled training data and since UBM training is unsupervised, a single column of training data in 'train_world.lst' should suffice, as per my understanding. But it gives me an error and on checking the code in bob/bio/base/database/filelist/models.py there is validation for 2 columns for 'train_world'.lst file as shown below.

def _read_column_list(self, list_file, column_count):

# read the list

rows = self._read_multi_column_list(list_file)

# extract the file from the first two columns

file_list = []

for row in rows:

if column_count == 2:

assert len(row) == 2

# we expect: filename client_id

file_list.append(FileListFile(file_name=row[0], client_id=row[1]))

elif column_count == 3:

assert len(row) in (2, 3)

# we expect: filename, model_id, client_id

file_list.append(FileListFile(file_name=row[0], client_id=row[2] if len(row) > 2 else row[1], model_id=row[1]))

elif column_count == 4:

assert len(row) in (3, 4)

# we expect: filename, model_id, claimed_id, client_id

file_list.append(FileListFile(file_name=row[0], client_id=row[3] if len(row) > 3 else row[1], model_id=row[1],

claimed_id=row[2]))

else:

raise ValueError(

"The given column count %d cannot be interpreted. This is a BUG, please report to the author." % column_count)

return file_list

Please guide me if I am missing out on some information about the library for unsupervised UBM-GMM training. I have tried jfa, gmm algorithms and my command line call is as follows:

verify.py -d /home/prasanna/anaconda2/lib/python2.7/site-packages/bob/bio/spear/config/database/custom.py -p energy-2gauss -e mfcc-60 -a gmm -s gmm --groups {world,dev,eval}

Thanks,

Prasanna

Manuel Günther

unread,

Jul 28, 2017, 6:13:52 PM7/28/17

to bob-devel

Dear Prasanna,

the file list database is designed to be generic, e.g., it should also be usable for other algorithms such as ISV, which will require to have training samples of several different clients.

In your case, you are right, there is no need to have different clients (and therewith no client ids) to train the UBM. However, in your file list you have to specify the client id. Since you don't have one (and you don't need), you can simply write, e.g., -1 as the client id in each row.

BTW: you can store your database configuration 'custom.py' in any directory that you want, it does not need to be stored inside the spear configuration file directory.

Best regards

Manuel

Prasanna Kothalkar

unread,

Jul 28, 2017, 6:57:24 PM7/28/17

to bob-devel

Dear Manuel,

Yes just running an experiment with 1 as client ID, but maybe will make it -1 as one of my classes is 1. Good to have information from the experts. Makes sense about spear configuration directory.