private dataset

55 views
Skip to first unread message

hanan.sh...@gong.io

unread,
Jun 18, 2017, 2:06:22 AM6/18/17
to bob-devel
New to bob-spear.
I am trying to figure out how should one use the package to train a ubm, enroll and test with proprietary wav files not of a public dataset.
In other words, do bob.spear has documentation on how to setup a new dataset?
Best,
Hanan

Pavel Korshunov

unread,
Jun 18, 2017, 9:39:49 AM6/18/17
to bob-...@googlegroups.com
Hi,

You can create file lists for your data in a specific format and setup a new database interface in a few lines. Please take a look here:
and here:

For example, once you create file lists, say inside folder 'folder_filelists', you can define a database for your experiments like so:

import bob.bio.base
from bob.bio.spear.database import AudioBioFile

class MyBeautyDB(bob.bio.base.database.FileListBioDatabase):
  """Wrapper class for MyBeautyDB speaker database.
  """

  def __init__(self, original_directory="[MyBeautyDB_DATA_DIRECTORY]", original_extension=".wav"):
    # call base class constructor
    from pkg_resources import resource_filename
    folder = resource_filename(__name__, '../folder_filelists')
    super(MyBeautyDB, self).__init__(folder, 'db_name', bio_file_class=AudioBioFile,
                                   original_directory=original_directory,
                                   original_extension=original_extension)

database = MyBeautyDB()

Now, you can use 'database' object inside your configuration scripts that you can pass to verify.py or train.py (or train_gmm.py, train_isv.py, etc) of bob.bio.spear.

I hope it helps.

Pavel

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+unsubscribe@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/
---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Dr. Pavel Korshunov
Biometric group
Idiap Research Institute
Rue Marconi 19
CH - 1920 Martigny
Switzerland

Room: 207

Hanan Shteingart

unread,
Jun 18, 2017, 9:51:59 AM6/18/17
to bob-...@googlegroups.com
Thanks,
Can you please give a more detailed example?
Say I have n-files wav files with m(n) segments in each file (start&end in seconds and speaker id) - how would one enter this information within the dataset?
Best,
Hanan

You received this message because you are subscribed to a topic in the Google Groups "bob-devel" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/bob-devel/mDSXKoaFvGA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to bob-devel+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Dr. Hanan Shteingart | Data Scientist | Gong.io
P: +972-54-2271572
M: hanan.sh...@gong.io
W: www.gong.io
Love Gong? Share the love and help us grow.

Manuel Günther

unread,
Jun 25, 2017, 7:28:23 PM6/25/17
to bob-devel
Hi Hanan,

I am afraid that it is currently not possible to have samples from different speakers in the same file. The only solution I can think of is to split the data into several files with an external application and store the relevant parts of it separate files, one per speaker and sample.

Best regards
Manuel

Amir Mohammadi

unread,
Jul 5, 2017, 9:51:52 AM7/5/17
to bob-devel
Hi,

It is possible to have a database interface that handles your files but maybe per-processing your data as Manuel explained is easier.
In order to do so, you need to create a database interface and not use the filelist database interface. This is best explained in bob.db.base docs:https://www.idiap.ch/software/bob/docs/bob/bob.db.base/master/ especially the development guide and then in https://www.idiap.ch/software/bob/docs/bob/bob.bio.base/master/implementation.html#databases .
An example is also the bob.db.nist_sre12 database where each file contains two samples: https://gitlab.idiap.ch/bob/bob.db.nist_sre12/blob/14dd6ce6b3bc3ec1cce8b5eb858c0c21760dab4a/bob/db/nist_sre12/models.py#L124  and its high-level interface in https://gitlab.idiap.ch/bob/bob.bio.spear/tree/add_nistsre12_db_

but again it's easier to pre-process your files and create files per sample, per speaker.

Best,
Amir

--
-- You received this message because you are subscribed to the Google Groups bob-devel group. To post to this group, send email to bob-...@googlegroups.com. To unsubscribe from this group, send email to bob-devel+...@googlegroups.com. For more options, visit this group at https://groups.google.com/d/forum/bob-devel or directly the project website at http://idiap.github.com/bob/

---
You received this message because you are subscribed to the Google Groups "bob-devel" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bob-devel+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages