ROSE Lab

ROSE-Youtu Face Liveness Detection Dataset

We introduce a new and comprehensive face anti-spoofing database, ROSE-Youtu Face Liveness Detection Database, which covers a large variety of illumination conditions, camera models, and attack types. The ROSE-Youtu Face Liveness Detection Database (ROSE-Youtu) consists of 4225 videos with 25 subjects in total (3350 videos with 20 subjects publically available with 5.45GB in size). It also includes a new Client-Specific One-Class Domain Adaptation Protocol with an additional 1.25GB of pre-processed data.

For each subject, there are 150-200 video clips with the average duration around 10 seconds. Five mobile phones were used to collect the database: (a) Hasee smart-phone (with resolution of 640 * 480), (b) Huawei Smart-phone (with resolution of 640 * 480), (c) iPad 4 (with resolution of 640 * 480), (d) iPhone 5s (with resolution of 1280 * 720) and (e) ZTE smart-phone (with resolution of 1280 * 720). All face videos are captured by a front-facing camera. The standoff distance between face and camera is about 30-50 cm.

For genuine face video, normally there are 25 videos (5 devices with 5 scenes). The scene covers 5 different illumination conditions in office environment. If the client wears eye-glasses, there will be another 25 videos.

We consider three spoofing attack types including printed paper attack, video replay attack, and masking attack. For printed paper attack, face image with still printed paper and quivering printed paper (A4 size) are used. For video replay attack, we display a face video on Lenovo LCD screen and Mac screen. For masking attack, masks with and without cropping are considered. Moreover, the face videos are captured with different backgrounds which guarantee the face videos are coupled with different illumination conditions. To keep consistent with the genuine face video, the standoff distance between spoofing medium and camera is also about 30-50 cm.

Evaluation Protocols

1. Intra-dataset Evaluation Protocol

We divide the ROSE-Youtu Database into training and testing subsets. Videos belonging to the first 10 indexed subjects (2,3,4,5,6,7,9,10,11,12) are used for training and the others are for testing.

Please see the table below regarding the evaluation based on ROSE-Youtu database. The performance is measured by Equal Error Rate (EER).

2. Client-Specific One-Class Domain Adaptation Protocol (NEW)

We sample 10 subjects (clients) from the ROSE-Youtu Database and devise a client-specific one-class domain adaptation task with each client.

Each task contains 50 genuine face videos and 110 attack videos. We uniformly divide 25 genuine face videos as training (adaptation) videos. The remaining 25 genuine face videos together with all of 110 attack videos are divided as testing videos. We sample 1 frame from each training video as the training (adaptation) data. And we sample 50 frames from each testing video as testing data. (The pre-processed data are provided.)

Usage for Academic Reseach

1. Usage for Academic Research
This video database is the result of a collaboration between Tencent Corporation and the NTU ROSE Lab. It is released for academic research only, and is free to researchers from educational or research institutes for non-commercial purposes.

2. Release Agreement
The use of this dataset is governed by the following terms and conditions:

a. Without the expressed permission of the ROSE Lab, any of the following will be considered illegal: redistribution, derivation or generation of a new dataset from this dataset, and commercial usage of this dataset in any way or form, either partially or in its entirety.
b. For the sake of privacy, images of all subjects in this dataset are only allowed for demonstration in academic publications and presentations.
c. All users of the ROSE-Youtu Face Liveness Detection dataset agree to indemnify, defend and hold harmless, the ROSE Lab and its officers, employees, and agents, individually and collectively, from any and all losses, expenses, and damages.
d. Tencent Corporation may have patents, patent applications, trademarks, copyrights, and other intellectual property rights covering this dataset. Provision of this dataset does not give you any license under these patents, trademarks, copyrights, or other intellectual property rights.

If interested, researchers can register for an account, submit the request form and accept the Release Agreement. We will validate your request and grant approval for downloading the datasets.

Related Publications

Please cite the following reference if you use the dataset:

Haoliang Li, Wen Li, Hong Cao, Shiqi Wang, Feiyue Huang and Alex C. Kot, “Unsupervised Domain Adaptation for Face Anti-Spoofing”, IEEE Transactions on Information Forensics and Security, 2018.
Zhi Li, Rizhao Cai, Haoliang Li, Kwok-Yan Lam, Yongjian Hu and Alex C. Kot, “One-Class Knowledge Distillation for Face Presentation Attack Detection”, IEEE Transactions on Information Forensics and Security, 2022.

Examples of ROSE-Youtu Face Liveness Detection Database (ROSE-Youtu)

From top to bottom: face images in genuine, cropped mask, full mask, upper mask, paper print and video replay versions. (For paper print attack, both warped paper and still paper attacks are considered.) From left to right: face images captured by iPhone 5S, Hasee mobile phone, Huawei mobile phone, iPad and ZTE mobile phone.

Naming Details

These videos are recorded from a total of 20 persons in MP4 format. The naming template of each mp4 is L_S_D_x_E_p_N (seven sections connected with ‘_’). Each section is introduced as follows:

1. The first section ‘L’ could by any one of the follow 9 strings:
1. 1) G - ‘G’ indicates a genuine person.
2. 2) Ps - ‘Ps’ indicates a still printed paper.
3. 3) Pq - ‘Pq’ indicates a quivering printed paper.
4. 4) Vl - ‘Vl’ indicates a video which records a lenovo LCD display.
5. 5) Vm - ‘Vm’ indicates a video which records a Mac LCD display.
6. 6) Mc - ‘Mc’ indicates a paper mask with two eyes and mouth cropped out.
7. 7) Mf - ‘Mf’ indicates a paper mask without cropping.
8. 8) Mu - ‘Mu’ indicates a paper mask with the upper part cut in the middle.
9. 9) Ml - ‘Ml’ indicates a paper mask with the lower part cut in the middle.
2. The second section ‘S’ could be any one of the follow 2 strings:
1. 1) T - ‘T’ indicates the subject is speaking when recording.
2. 2) NT - ‘NT’ indicates the subject is not speaking when recording.
3) The third section ‘D’ indicates the devices recording the videos, and it could be any one of the follow 5 strings:
1. 1) HS - ‘HS’ indicate the video is recorded by a Hasee smart-phone.
2. 2) HW - ‘HW’ indicate the video is recorded by a Huawei smart-phone.
3. 3) IP - ‘IP’ indicate the video is recorded by an Ipad.
4. 4) 5s - ‘5s’ indicate the video is recorded by an Iphone 5s.
5. 5) ZTE - ‘ZTE ’indicate the video is recorded by a ZTE smart-phone.
4．The fourth section ‘X’ indicates whether the subject is wearing eyeglasses, and it could be any one of the follow 2 strings:
1. 1) g - ‘g’ indicates the subjects is wearing glasses.
2. 2) wg - ‘wg’ indicates the subjects is not wearing glasses.
5．The fifth section ‘E’ indicates the background environment when the video is recorded. This section is reserved and has not been set yet.
6．The section ‘p’ indicates the person ID.
7．The last section ‘N’ indicate the file index number.