Using washed up CASIA-webface dataset

6,979 views
Skip to first unread message

Dong Xu

unread,
Mar 31, 2016, 3:32:00 AM3/31/16
to CMU-OpenFace
From https://github.com/happynear/FaceVerification. Feng Wang said:
"The CASIA-webface dataset is really very dirty, and I believe that if someone could wash it up, the accuracy would increase further. If you did so, please kindly contact me. I will pay for it.

Good News: @潘泳苹果皮 and his colleagues have washed the CASIA-webface database manually. After washing, 27703 wrong images are deleted. The washed list can be downloaded from http://pan.baidu.com/s/1kUdRRJT with password 3zbb. Great thanks to them!"

Brandon Amos

unread,
Mar 31, 2016, 2:40:13 PM3/31/16
to Dong Xu, CMU-OpenFace
Hi, thanks for sharing this.
I look forward to experimenting with this dataset:

https://github.com/cmusatyalab/openface/issues/119

-Brandon.
signature.asc

Dante Knowles

unread,
May 26, 2016, 2:14:50 PM5/26/16
to CMU-OpenFace, app...@gmail.com, ba...@cs.cmu.edu
Did you ever download this set? I couldn't manage to get the download to complete.

Dong Xu

unread,
May 27, 2016, 10:09:46 PM5/27/16
to CMU-OpenFace
The new url  http://pan.baidu.com/s/1kUUP0IN


在 2016年5月27日星期五 UTC+8上午2:14:50,Dante Knowles写道:

Ilya Shapiro

unread,
May 29, 2016, 5:14:18 AM5/29/16
to CMU-OpenFace
Is there a list of identities for this data set? There are just numbers per each identity. I would like to use several databases for training and would like to remove identities that appear at more than one database.

Brandon Amos

unread,
Jun 3, 2016, 4:49:49 PM6/3/16
to CMU-OpenFace
I'm also interesting in merging the washed up CASIA WebFace dataset with others. Having explicit identities makes this the easiest. If the identities aren't available for some reason, we could potentially try the following:
  1. Using perceptual hashes (phashes, https://github.com/JohannesBuchner/imagehash) to map the washed up images back to the original images and find the identities from here 
  2. I think the current OpenFace model is accurate enough that we can potentially use it to help guide the merging of arbitrary datasets.
If the images are very similar I think using phashes will be the easiest since CASIA-WebFace has identity information.

-Brandon.

Ilya Shapiro

unread,
Jun 5, 2016, 4:10:36 AM6/5/16
to CMU-OpenFace
Hi,
here is a working link that I used:
http://pan.baidu.com/share/link?shareid=4139272429&uk=1543819581
The connection is really bad if you trying to download the data without a baidu account and a baidu download manager "BaiduYunGuanjia". Only this way I manged to download it.
I do not have the original database, so I do not have any labeling for the identities beside that each person's images are in separate folder.
All the images are cropped to 250*250. Is the data at the original CASIA WebFace dataset in the same format? if not, I do not think that the imagehash can help to retrieve the labeling in this case. We can use the openface to make the match.
Can you send me the list of the original identities? It will help me to determine which identities to look for.

Ilya

Brandon Amos

unread,
Jun 6, 2016, 11:57:11 AM6/6/16
to CMU-OpenFace
Hi Ilya,
 
Can you send me the list of the original identities?

 I've attached the list of names. I just noticed that the first column in this file is a numeric identifier for the person's identity. Is this the number that is used for the washed up CASIA identities?

-Brandon.
names.txt
Message has been deleted

Ilya Shapiro

unread,
Jun 7, 2016, 4:05:43 AM6/7/16
to CMU-OpenFace
Yes it is!
That is great, because the problem of merging the datasets reduced to finding similar images in the overlapping identities.
Is the link that I provided worked for you?

-Ilya
Message has been deleted

Majid Azimi

unread,
Oct 19, 2016, 5:14:53 AM10/19/16
to CMU-OpenFace
Hi,

none of the links works.

Best,
Majid

beimin...@gmail.com

unread,
Dec 20, 2017, 7:00:22 AM12/20/17
to CMU-OpenFace
zip file downloading from this link could not decompress correctly, many error exists:

file #147386:  bad zipfile offset (local header sig):  4164055149
file #147387:  bad zipfile offset (lseek):  4164059136
file #147388:  bad zipfile offset (lseek):  4164075520
file #147389:  bad zipfile offset (EOF):  4164083219
Reply all
Reply to author
Forward
0 new messages