Using client id in bob.bio.base

Vedrana Krivokuca

unread,

Nov 3, 2017, 5:23:14 AM11/3/17

to bob-devel

Hello all,

For a paper package (as well as for my work on template protection in general), I need to use a subject's client id to generate their protected biometric template. The problem is that the client id is not passed through the verify.py pipeline, so it seems that I will have to modify bob.bio.base in some way to achieve this. At the moment, the solution I'm thinking of is the following:

In bob.bio.base's Algorithm class, add a parameter "requires_client_id" in __init__(). Set it to False by default. So:

class Algorithm (object):
def __init__(
      self,
      requires_client_id=False,
):
    self.requires_client_id = requires_client_id

Then, I would create a class that inherits from Algorithm, and in this class I would set requires_client_id=True. In this class, my "protection" will be implemented in the "project" method.

Then, in bob.bio.base.tools.algorithm:

def project(algorithm, extractor, groups = None, indices = None, allow_missing_files = False, force = False):
... (other code)

# the file selector object
fs = FileSelector.instance()

... (other code)

original_data_files = fs.original_data_list(groups=groups)

... (other code)

# extract the features
for i in index_range:
    feature_file = feature_files[i]
    projected_file = projected_files[i]

... (other code)

      # project feature
      if algorithm.requires_client_id:
        file_object = original_data_files[i]
        projected = algorithm.project(feature, file_object.client_id) # protected feature
      else:
        projected = algorithm.project(feature) # unprotected feature

... (other code)

This does not seem like a long-term solution, because sometimes the protection might need to occur at a different stage of the pipeline. So, maybe eventually we should think about introducing a "protect" module somewhere. But for now, in order to ensure the least impact for other people, please let me know what you think about the above solution.

By the way, I know I could have just created a merge request for this in bob.bio.base Git repository (instead of explaining the code here), but I would like to see if anyone has a better suggestion before I go through this effort.

Thank you!

Vedrana

Manuel Günther

unread,

Nov 3, 2017, 12:53:25 PM11/3/17

to bob-devel

Dear Vedrana,

I am not sure why you would need the client_id in the projection step. The projection step is most likely not, where you want to create a (protected) template, but this would rather be in the enrollment stage -- which is responsible to create templates.

Anyways, in the enrollment step the `Algorithm.enroll` function also does not know anything about the client_id.

So far, I have never had the issue that the client_id needed to be present anywhere in bob.bio, so this has not been in my design. The client_id is factually only used when writing the score files. However, your solution should be replicable for the enrollment step rather than for the projection step.

Manuel

Vedrana Krivokuca

unread,

Nov 6, 2017, 3:03:38 AM11/6/17

to bob-devel

Dear Manuel,

Thank you for your reply. I have already thought about incorporating the template protection into the enrollment stage; however, this is unsuitable for my purposes. This is because I need matching to be performed **in the protected domain**, which means that both the model AND the probe need to be protected. If I implement the protection step in the enrollment stage, only the models will be protected. What I really need is to "protect" the entire set of biometric samples that are to be used in the recognition pipeline (so this will depend on the matching protocol used). In my initial implementation, I incorporated the protection step in the extraction stage. This worked fine for me, but it meant that the extraction had to be re-done each time I changed the parameters in the protection function. So, to make it easier on the user of my paper package, I would like to implement the protection step in the projection stage. This way:

1.) The extraction needs to be done only once.
2.) All the biometric samples I need for matching (i.e., both models and probes) will be protected.

Like I said, what I would really like is to have a "protection" stage in the verify.py pipeline. However, for now I might have to settle for a simpler solution, due to time constraints. For this reason, the "projection" implementation seems to be the best option given my requirements and the way that bob.bio is set up.

As to why I need the client id, I explained this in my initial post but maybe it was not clear. To cut a long story short, for this particular paper package I need a user-specific 'seed'. This seed will generate a user-specific matrix, which is used for the "protection". The seed is meant to be random but repeatable. Since I used the client id as the seed to generate the results for my paper, I need to use the same thing for the final paper package so that my users can reproduce the results. The only other option I can think of at this stage is to write the client ids or seeds in a textfile and read them from there, but wouldn't it be nicer to use the bob.bio framework?

Best,
Vedrana

Vedrana Krivokuca

unread,

Nov 6, 2017, 3:05:20 AM11/6/17

to bob-devel

P.S. I just realised that I did not, in fact, fully explain why the client id is necessary for the protection in my initial post. Apologies!

Amir Mohammadi

unread,

Nov 6, 2017, 6:22:46 AM11/6/17

to bob-...@googlegroups.com

Hi Vedrana,

Are you sure you will be satisfied with the protection mechanism implemented in the projection step?

Bob.bio.base does not support this kind of toolchian where you can have a protection step.

We are aware of the limitations and an issue exists for it: https://gitlab.idiap.ch/bob/bob.bio.base/issues/93

I am afraid what you are proposing here will be just a temporary hack inside bob.bio.base.

I have been thinking of limitations of the framework myself too. I think the best place to

hack into bob.bio.base is in the Preprocessors! since there you have access to the database

biofiles!

Here is what I think could be done to implement the protection step with no changes inside

bob.bio.base:

1- Run the normal recognition pipeline using verify.py

2- Run a second pipeline with only a preprocessor that takes data from whatever step you choose

(be it extracted or projected) and saves the protected version of them in a folder called protected.

3- Run the normal pipeline again and this time instead use the data from the protected folder instead of projected (or extracted).

Now the special preprocessor will look something like this:

from bob.bio.base.preprocessor import Preprocessor
from bob.bio.base import load

def load_projected_data(biofile, directory, extension):

# directory will point to the "projected" folder here

    path = biofile.make_path(directory, extension)
    data = load(path) # You may need to call Algorith.read_feature or something like that here instead. This is a simple version.
    return data, str(biofile.client_id) # return the client_id here

class Protector(Preprocessor):
    def __init__(self, read_original_data=None):
        if read_original_data is None:
            read_original_data = load_projected_data
        super(Protector, self).__init__(read_original_data=read_original_data)

        self.client_seeds = {} # keep track of the clients that you have seen.

    def __call__(self, data, annotations=None):
        data, client_id = data
        if client_id not in self.client_seeds:
            self.client_seeds[client_id] = my_new_seed()
        return my_protect(data, self.client_seeds[client_id]) # protect the data here with a client specific seed

Let me know if I am not clear.

Now changing bob.bio.base to include the step 2 of above in its toolchain so it looks less hacky would be more sensible IMHO.

Best,

Amir

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Vedrana Krivokuca

unread,

Nov 6, 2017, 9:29:45 AM11/6/17

to bob-devel

Hi Amir,

Thank you for your suggestion. It seems sensible, so I'll think about how it would work for my purposes.

However, any fixes at this stage would be hacks. What if we had a "protect" method in our Algorithm class then I could get the client id in the "protect" function inside the algorithm.py script (as suggested in my initial post, except that this time everything would happen inside the "protect" function instead of in the "project" function)? That way the protection part would be more easily integrated into the verify pipeline. It could be optional, just like the "project" step is at the moment, so that others' work is not affected much.

Manuel Günther

unread,

Nov 6, 2017, 12:04:30 PM11/6/17

to bob-devel

Hi Vedrana,

I think having a separate `protect` step makes a lot more of sense than hacking something into `bob.bio.base`. At this point in time, however, it seems that it is not applicable because of lack of time.

Maybe you can go with Amir's solution for now.

But, as you have said, you want to avoid running the preprocessing and extraction over and over again. Another less hackish and better supported option would be to obtain the client_id during preprocessing, similar to what is done in Amir's `load_projected_data` function. Then you can store this client_id in the preprocessed data file, write and read it with the extractor, and finally use it in the `project` step of the algorithm. While you have to change preprocessors and extractors for this, you might want to make use of the new `ParallelProcessor` that Amir has recently implemented. Maybe, Amir can help you out on how to use those classes.

Anyways, having a seed based on the ground-truth client_id makes sense for the enrollment step, where clients are usually known. For probe templates, I am not sure if making use of the client_id violates the protocol -- usually you do not know the client_id of a probe file. I hope you took care of that in your paper.

Manuel

Vedrana Krivokuca

unread,

Nov 7, 2017, 3:10:42 AM11/7/17

to bob-devel

Hi Manuel,

Thank you for your suggestions.

I've still got some time for the paper package, so I would prefer to have the cleaner "protect" solution if possible. Not only would this look cleaner, but it would be useful for other parts of my work. How long do you think it would take to implement this method? I have started looking at the code and I think I more or less understand what needs to be done, but I'm worried that there are some intricacies/dependencies deeper inside bob.bio.base, which I am not aware of. So, I am wondering how you would feel about implementing the solution yourself? This may be faster as you are more familiar with the code, but if you do not want to or don't have time to do it, I could try.

As for your comment about the client ids ... In my case, using the client id does not violate the protocol. In fact, I consider two scenarios. The first is the "Normal" scenario. Here, it is assumed that each person's protected template is created using their own user-specific seed. The seed does not have to be the client id, but this was just an implementation choice to make it easier to reproduce the results. In practice, you would definitely not use the client id, but a randomly-generated secret number. The second scenario is the "Stolen Token" scenario. Here, it is assumed that the probe steals the model's user-specific seed and uses it with their own biometric to create the protected template. In this case, I just use the same seed for everyone (client id is not used). Hope that makes sense!

Best,
Vedrana

Manuel Günther

unread,

Nov 7, 2017, 11:57:17 AM11/7/17

to bob-devel

Dear Vedrana,

First, I am not sure if I will find the time to implement such a solution. If you want to try it yourself, you can have a look into what I have implemented in `bob.bio.gmm`, where I have added some steps in order to parallelize the GMM training: https://gitlab.idiap.ch/bob/bob.bio.gmm/blob/master/bob/bio/gmm/script/verify_gmm.py

In a similar fashion, you could add another 'protect' step to your toolchain. However, I have to admit that this is currently very hackish and the code is spread out through several files. As Amir pointed out, we have an open issue about the complication of adapting the toolchain here: https://gitlab.idiap.ch/bob/bob.bio.base/issues/93, so it might become easier in the future.

On the other hand, it might also be useful to have a general `protect` step inside all algorithms in bob.bio.base. But this would require a major revamp of the whole toolchain. I think we might change that when we have solved the issue above.

In the meantime, I would suggest that you implement the new toolchain in your software package -- or choose on of the solutions we have proposed above.

Sorry that I cannot help you more.

Manuel

Vedrana Krivokuca

unread,

Nov 9, 2017, 10:42:05 AM11/9/17

to bob-devel

Thank you for your reply, Manuel. Yes, I was thinking more along the lines of having a general "protect" step in bob.bio.base ...

I had a look at your suggested solution but, you are right, it does seem quite hackish and perhaps unnecessarily complicated for a solution that I might need to change anyway when we (hopefully!) integrate the protection step into the pipeline. I've decided to go with a different approach, one which is still necessarily a hack but seems simpler for my purposes (especially considering that it might be a temporary solution).

Thank you and Amir for your input!

Best,
Vedrana

Reply all

Reply to author

Forward