Question about GMM statistics and UBM

Marta Gomez-Barrero

unread,

Oct 1, 2013, 11:41:45 AM10/1/13

to bob-...@googlegroups.com

Hi,

I posted this question on the bug section, and I'm moving it here. "laurentes" answered me.

My intention is to develop some code, starting with the usage of an ISV system. Therefore, running the scripts on the command line is not an option.

What I need to do is understand the ISV tool from the facereclib and/or bob. I think the facereclib should be easier... but I cannot find how to train the machine.

I've read your answer.. thanks for the info! But I still have some doubts about the practical implementation. What I need is:

- I have a database (the ATT for now) with the interface, and I can read the images well. No problem here.

- I want to run ISV on that database (not run the script, but have the code so that in the next step of my coding I can add some extra layer after ISV). Here's my problem: I've seen the machine and trainer in Bob, I've seen the tool in facereclib... and I'm quite confused about how to use them.

Thanks!

Marta

Laurent El Shafey

unread,

Oct 1, 2013, 11:57:40 AM10/1/13

to bob-...@googlegroups.com

Hello,

First, the facereclib code is indeed pretty tricky, since you have
several layers of abstraction to deal with several algorithms.

From what you're saying, I think that a good starting point will be the
following satellite package of bob:
https://pypi.python.org/pypi/xbob.example.faceverify
(It is also hosted on github).

The above-mentioned package provide a DCT UBM GMM example. Since ISV is
built on top a DCT UBM GMM system, I think that the easiest solution for
you would be to adapt the script:
xbob/example/faceverify/dct_ubm.py
and to integrate the additional ISV code. This will definitely requires
some work.

In addition, this package relies on the AT&T database, which seems to be
good for you.

Some help/remarks to get you started:
1. Training: The UBM GMM training is unsupervised (no need of
class/client information), whereas the ISV training is supervised. This
implies that you will have to query the objects from the database API in
a slightly different way, before calling the training procedure for ISV.
2. Enrollment: You will indeed need to compute the GMM statistics. You
can have a look at the stats() method in the above-mentioned script for
this purpose.
3. Scoring: I don't think that there will be any complication there.

I'll let you have a look at this package on your own for now. If you
have any problem to achieve your goal, just let us know, being as
specific as you can.

Cheers,
Laurent (aka laurentes on github)

> --
> -- You received this message because you are subscribed to the Google
> Groups bob-devel group. To post to this group, send email to
> bob-...@googlegroups.com. To unsubscribe from this group, send email
> to bob-devel+...@googlegroups.com. For more options, visit
> this group at https://groups.google.com/d/forum/bob-devel or directly
> the project website at http://idiap.github.com/bob/
> ---
> You received this message because you are subscribed to the Google
> Groups "bob-devel" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to bob-devel+...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Marta Gomez-Barrero

unread,

Oct 2, 2013, 2:34:12 AM10/2/13

to bob-...@googlegroups.com, laurent....@idiap.ch

Thank you Laurent!

I already have that package, so I'll get started with it

Bests,

Marta

Manuel Günther

unread,

Oct 2, 2013, 4:41:33 AM10/2/13

to bob-...@googlegroups.com, laurent....@idiap.ch

Dear Marta,

indeed, the xbob.example.faceverify package is usually a good starting point to understand, what is going on. I would also suggest to read the tutorial for GMM and ISV trainers on Bob's webpage:
http://www.idiap.ch/software/bob/docs/releases/last/sphinx/html/TutorialsTrainer.html#inter-session-variability
I hope that brings more light into the dark. I have to admit that ISV is quite complicated to understand and to implement.

The FaceRecLib itself is designed to run comparable face recognition experiments. The implementation of ISV in the FaceRecLib is, as Laurent already pointed out, relatively complicated to read since it is optimized and spread over several files.
E.g. the UBM training is implemented in facereclib/tools/UBMGMM.py, the base class. This training is called during the ISV training facereclib/tools/ISV.py line 104: UBMGMM._train_projector_using_array(self, data1), before the ISV training is started.

As Laurent pointed out, the training of the UBM and the ISV needs that data in different format. While the UBM is client independent, ISV needs to have the training data separated by client. This is implemented in the FaceRecLib's version of ISV. Since the UBM/GMM system from xbob.example.faceverify does not need client information, they are not extracted, but this should not be a big problem. You might want to have a look into the facereclib/databases/Database.arrange_by_client(...) function.

The AT&T database is a nice small database that you can use to implement and test your algorithm. Note that this database is *outdated* and should not be used in any published experiment. Also, since this database is small, it might not give good results for ISV (which usually requires more training data).
To run comparable experiments, I would suggest you to adapt the FaceRecLib (e.g. by adding another Tool class that implements your algorithm) and run experiments with the FaceRecLib. This will save you a lot of time implementing the database interface, the preprocessing, the feature extraction, and the evaluation.
Of course, first you have to get your algorithm to run.

Best regards
Manuel

Marta Gómez Barrero

unread,

Oct 2, 2013, 4:45:18 AM10/2/13

to bob-...@googlegroups.com, laurent....@idiap.ch

Dear Günther,

thanks for all those comments, I really appreciate any help!

I'm working with the scripts and reading those parts of the tutorial. I'll let you know if I have any other question.

Best regards,

Marta

2013/10/2 Manuel Günther <siebe...@googlemail.com>

--

-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---

You received this message because you are subscribed to a topic in the Google Groups "bob-devel" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bob-devel/dn4GsCwI168/unsubscribe.
To unsubscribe from this group and all of its topics, send an email to bob-devel+...@googlegroups.com.

Marta Gómez Barrero

unread,

Oct 2, 2013, 4:51:55 AM10/2/13

to bob-...@googlegroups.com, laurent....@idiap.ch

ps: I'm only using this database for the initial experiments, to make sure that everything does what it's supposed to do. When that's done, I'll use a newer and bigger database for the "official" results.

2013/10/2 Marta Gómez Barrero <mart...@gmail.com>

Marta Gomez-Barrero

unread,

Oct 6, 2013, 6:16:15 AM10/6/13

to bob-...@googlegroups.com, laurent....@idiap.ch

Hi Laurent and Günther,

In the end I'm using facereclib... I finally "saw the light" and I think I learned how to use it.

My problem now is, that apparently everything goes well, with no errors or warnings.... but when I evaluate the scores, I get EER = 50%!! So something is definitely wrong.

I'm running the tests on the graph matching algorithm (just because it's much faster for tests), and I get the same problem. So I guess the mistake is in reading the images. However, when I print the files and the ids, everything seems correct. But when I print the features, here's the problem: for all the images (regardless of the user) I obtain the same features, even though they're not identical. This leads to obtaining the same score for all the images against all the models. And thus the 50% EER.

Here's the code I implemented. I copied the initialization from an example of one of your packages:

atnt_db = facereclib.databases.DatabaseXBob(

database = xbob.db.atnt.Database(),

name = "gbu",

original_directory = "/Users/martagomezbarrero/Downloads/orl_faces",

original_extension = ".pgm",

)

# Gabor grid graphs for the Gabor graphs algorithm:

gabor_graph_feature_extractor = facereclib.features.GridGraph(

# Gabor parameters

gabor_sigma = math.sqrt(2.) * math.pi,

# what kind of information to extract

normalize_gabor_jets = True,

extract_gabor_phases = True,

# setup of the fixed grid

first_node = (4, 4),

#image_resolution = (CROPPED_IMAGE_HEIGHT, CROPPED_IMAGE_WIDTH),

image_resolution = (112, 92),

node_distance = (8, 8)

)

# Gabor graphs: Use the similarity function incorporating the Gabor phase difference and the Canberra distance

gabor_graph_tool = facereclib.tools.GaborJets(

# Gabor jet comparison

gabor_jet_similarity_type = bob.machine.gabor_jet_similarity_type.PHASE_DIFF_PLUS_CANBERRA,

# Gabor wavelet setup; needs to be identical to the feature extractor

gabor_sigma = math.sqrt(2.) * math.pi

)

I also adapted the functions for loading images to what I needed:

#######################################################################

### Functions for loading images and for load + extract

def load_images_enrol(db):

"""Reads the images for the given group and the given client id from the given database"""

# get the file names from the database

model_ids = db.model_ids()

# iterate through the list of file names

images = []

for k in model_ids:

lst = []

files = db.enroll_files(k)

#print k

#print files

for k in files:

image = bob.io.load(k.make_path(db.original_directory, db.original_extension))

#image = tan_triggs_preprocessor(image)

lst.append(image)

images.append(lst)

return images, model_ids

def load_images_probe(db):

"""Reads the images for the given group and the given client id from the given database"""

# get the file names from the database

files = db.probe_files()

#print files

# iterate through the list of file names

images = []

ids = []

for k in files:

image = bob.io.load(k.make_path(db.original_directory, db.original_extension))

#image = tan_triggs_preprocessor(image)

images.append(image)

ids.append(db.m_database.get_client_id_from_file_id(k.id))

#print ids

return images, ids

And finally, here's the code I'm running:

print "Extracting and enrolling models"

model_images, model_ids = load_images_enrol(atnt_db)

models = []

for user in model_images:

lst = []

for image in user:

feat = gabor_graph_feature_extractor(image)

lst.append(feat)

model = gabor_graph_tool.enroll(lst)

#print model

models.append(model)

print "Extracting and scoring probes"

positive_scores = []

negative_scores = []

probe_images, probe_ids = load_images_probe(atnt_db)

probe_features = []

for image in probe_images:

feat = gabor_graph_feature_extractor(image)

probe_features.append(feat)

counter1 = 0

for model in models:

k = model_ids[counter1]

print k

counter = 0

for feat in probe_features:

if probe_ids[counter] == k:

positive_scores.append(gabor_graph_tool.score(model, feat))

else:

negative_scores.append(gabor_graph_tool.score(model, feat))

counter += 1

counter1 += 1

I'm sorry for all the trouble I'm causing, but I have no clue about where the error may be. The py file I'm using is also attached.

Thanks a lot and best regards,

Marta

prueba6.py

Manuel Günther

unread,

Oct 6, 2013, 9:23:25 AM10/6/13

to bob-...@googlegroups.com, laurent....@idiap.ch

Dear Marta,

in fact, the FaceRecLib is designed to *run* the algorithms using the ./bin/faceverify.py scripts. This means that there are some optimizations included, which causes your problems.

Anyways, you can use the FaceRecLib tools directly, but you have to consider the optimizations. One of these optimizations is that the feature extraction step re-uses memory. For example, when you write:

for image in probe_images:

feat = gabor_graph_feature_extractor(image)

probe_features.append(feat)

the feature returned by the extractor is *always the same memory*. Hence, if you want to store the features, you have to copy them, e.g.:

...

probe_features.append(copy.deepcopy(feat))

This should solve your problem, though I haven't tested your code. Please note that you have the same code during model feature extraction.

Best regards

Manuel (not Günther since this is my last name...)

Marta Gómez Barrero

unread,

Oct 6, 2013, 9:42:12 AM10/6/13

to bob-...@googlegroups.com, Laurent El Shafey

Dear Manuel,

sorry about the name!! No more Günther

And thanks! I hadn't realized no actual values where returned, but only references. Laurent provided a similar explanation, and now the problem is solved. Thank you both!!

Bests regards,

Marta

2013/10/6 Manuel Günther <siebe...@googlemail.com>

Reply all

Reply to author

Forward