Datasets
We have been collecting data sets and conducting baseline and advanced personal identification studies using biometric measurements. We are committed to releasing all data collated to eligible research groups, with appropriate controls to forbid on-line distribution outside the research community.
If you are interested in obtaining any of the biometric datasets described below, please follow these instructions:
- Download all applicable license agreements. Several of our datasets require more than one license agreement.
- Have the license agreement reviewed and signed by an individual authorized to make legal commitments in the name of your organization.
- For university licensees – we cannot accept licenses signed by students or postdoctoral scholars under any circumstances. We cannot accept licenses signed by faculty members unless they have been explicitly delegated the authority to make contracts on behalf of the institution. Your institution's legal or contracting office must review and execute the license.
- Return the properly signed license agreement via your INSTITUTIONAL e-mail address (we cannot accept license agreements sent through third party e-mail providers) to cvrl@nd.edu. You may also fax requests to +1 574 631 9260, attention J. Dhar.
- Include in the e-mail/cover page the full name, title, address and phone number of the institution and institutional point of contact.
The Synthetic Forensic Iris (UND-SFI-2024) dataset contains images of iris images that resemble those captured from deceased subjects by an equipment compliant with ISO/IEC 19794-6. The data is categorized into 18 disjoint ranges of PMI (Post-Mortem Interval). In each range, there are 10,000 images representing 1,000 non-existent identities. There are 10 images per “identity” that may be considered as same-eye images.
The Masked Physiological Monitoring (MPM) dataset contains 159 video recordings from 54 human subjects wearing protective face coverings. Each recording consists of a 1920x1080 resolution losslessly compressed RGB video recorded at 90 frames per second with simultaneous PPG collected from two fingertip oximeters. Each recording lasts a minimum of 3 minutes where subjects converse, move their head, and sit still, resulting in over 8 hours of data.
The Multi-Site Physiological Monitoring (MSPM) dataset consists of 103 sessions, each lasting just over 14 minutes on average, in which human subjects engage in a variety of activities designed to elicit interesting physiological phenomena such as a breath hold to increase blood pressure, or to provide a challenging context for performing remote photoplethysmography (rPPG) such as an adversarial attack. Sessions were recorded in RGB from three different angles and near-infrared zoomed in on the eyes, along with cardiac pulse at ten sites across the body, blood oxygenation, and blood pressure using a cuff-based monitor.
The UND AAAI 2023 Dataset contains (a) images of live (authentic) faces, (b) images of faces synthetically generated by deep learning-based generative adversarial networks, and (c) regions annotated by humans solving the synthetic face detection task, indicating features supporting their decisions.
This dataset contains modified samples from the Flickr-Faces-HQ (FFHQ), made available under Creative Commons BY-NC-SA 4.0 license by NVIDIA Corporation (https://github.com/NVlabs/ffhq-dataset/blob/master/LICENSE.txt).
According to that license, one is allowed to redistribute and adapt FFHQ samples for non-commercial purposes, as long as one (a) gives appropriate credit by citing the FFHQ creator’s paper, (b) indicate any changes that one made, and (c) distribute any derivative works under the same license. In response to these requirements, we: (a) cited the paper indicated at https://github.com/NVlabs/ffhq-dataset in the paper publishing the UND AAAI 2023 Dataset, (b) inform that the modifications made to the original FFHQ samples include cropping the image around the detected face and rescaling such cropped samples to the 224x224 pixel resolution, and (c) the derivative work is distributed as the AAAI 2023 paper.
https://github.com/
The UND WACV 2023 CYBORG Dataset contains (a) images of live (authentic) faces, (b) images of faces synthetically generated by deep learning-based generative adversarial networks, and (c) regions annotated by humans solving the synthetic face detection task, indicating features supporting their decisions.
This dataset contains modified samples from the Flickr-Faces-HQ (FFHQ), made available under Creative Commons BY-NC-SA 4.0 license by NVIDIA Corporation (https://github.com/NVlabs/ffhq-dataset/blob/master/LICENSE.txt). According to that license, one is allowed to redistribute and adapt FFHQ samples for non-commercial purposes, as long as one (a) gives appropriate credit by citing the FFHQ creator’s paper, (b) indicate any changes that one made, and (c) distribute any derivative works under the same license. In response to these requirements, we: (a) cited the paper indicated at https://github.com/NVlabs/ffhq-dataset in the paper publishing the UND WACV 2023 CYBORG Dataset, (b) inform that the modifications made to the original FFHQ samples include cropping the image around the detected face and rescaling such cropped samples to the 224x224 pixel resolution, and (c) the derivative work is distributed as the AAAI 2023 paper. UND WACV 2023 CYBORG Dataset contains modified samples from the (https://creativecommons.org/licenses/by-nc/4.0/legalcode).
According to the request of the licensor (https://github.com/tkarras/progressive_growing_of_gans), one is allowed to use any of the material in their own work, as long as appropriate credit is given to the creators by mentioning the title and author list of their paper: Tero Karras, Timo Aila, Samuli Laine, Jaakko Lehtinen, “Progressive Growing of GANs for Improved Quality, Stability, and Variation,” ICLR 2018.
LivDet-Iris-2023-Part1-Notre Dame License Agreement, LivDet-Iris-2023-Part1-Clarkson License Agreement
LivDet-Iris-2023 dataset contains images of live (authentic) irises and images of irises synthetically-generated by deep learning-based generative adversarial networks. The primary goal of creating and sharing this dataset is to allow researchers to participate in LivDet-Iris 2023 competition by delivering to the organizers the presentation attack detection scores associated with these images. After the LivDet-Iris 2023 competition is concluded, this dataset may be a useful benchmark allowing to compare future solutions with those submitted to the competition.
All data is de-identified. Assembly of this data set was supported by the US National Institute of Standards and Technology.
The BVC-UNN-face data set was collected by the Biometrics Vision and Computing (BVC) group at the University of Nigeria. It includes a database of face images of Nigerians.
The BVC group through the University of Nigeria retains ownership and copyright of the BVC-UNN-face dataset. This data is distributed via the University of Notre Dame upon receipt of a properly executed copy of the license agreement.
For details on publishable images, please click here.
This dataset may be useful for studying accuracy differences across female / male demographics.
Data Type: RGB Face Video, Pulse waveforms and Heart rate, Approximate Download Size: 7 TB
This dataset overlaps with DDPM specifically for remote pulse detection. It consists of losslessly compressed RGB videos and ground truth pulse waveforms and heart rate (HR) for 86 subjects. The data was collected in an interview scenario with subjects freely moving, talking, and exhibiting facial expressions. Each video lasts around 10 minutes, recorded at 90 frames-per-second giving several million visible-light frames at 1920x1080 resolution. Pulse data was interpolated to the video sampling rate, such that each frame has a waveform and HR pair. Subject metadata describing age, gender, race, and ethnicity is included. Predefined train, validation, and test splits are also included to compare with results presented in the original paper.
Data Type: Video with corresponding force plate data, Approximate Download Size: 20 GB
This dataset consists of videos of 89 female athletes performing 582 evaluative jumps for the purpose of predicting ACL injury risk. The dataset includes three videos from different angles for each jump as well as force plate data. For more information please see the detailed description Here.
Data Type: Visible Face Images, Approximate Download Size: 4.2 GB
Data Type: RGB Face Video, NIR Face Video, LWIR Face Video, Pulse waveforms, Heart rate, Approximate Download Size: 12 TB
The Deception Detection and Physiological Monitoring (DDPM) dataset captures an interview scenario in which the interviewee attempts to deceive the interviewer on selected responses. The interviewee is recorded in RGB, near-infrared, and long-wave infrared, along with cardiac pulse, blood oxygenation, and audio. After collection, data were annotated for interviewer/interviewee, curated, ground-truthed, and organized into train/test parts for a set of canonical deception detection experiments. The dataset contains almost 13 hours of recordings of 70 subjects, and over 8 million visible-light, near-infrared, and thermal video frames, along with appropriate meta, audio, and pulse oximeter data.
Data Type: IR Iris Still, Approximate Download Size: 2.7 GB
Data Type: IR Iris Still, Approximate Download Size: 2.4 GB
To obtain this data set, you must agree to, and your institution must execute, both the data license agreement and the permission form.
Data Type: Video, Approximate Download Size: 3.3 GB
The VBOLO dataset was collected in several sessions, at various checkpoints within public transportation facilities such as tunnels, bridges, and hallways. These capture environments include different camera mount heights and depression angles, illuminations, backgrounds, resolutions, pedestrian poses, and distractors. This dataset provides a good scenario for the facial ReID problem. This dataset uses a small set of known individuals - ``actors'', who move in and out of the surveillance cameras' fields of view, together with the unknown persons denoted as ``distractors''. The ``actors'' change clothing randomly between each ``appearance'' in a camera's field of view.
Compared to a typical body-based ReID dataset, which has only a few images for each subject, the VBOLO dataset has a large number of annotations for each subject from consecutive video frames, which mimic a real scenario for surveillance tracking and detection. This is significantly challenging for matching, because: 1) Faces change size significantly e.g. , from 12x12 to 150x150) and exhibit significant pose variations as well. 2) The cameras supplying the probe and gallery images may have different resolutions and points of view.
Data Type: Synthetic Face Images, 3D Head Models, Approximate Download Size: 211 GB
The dataset contains two types of data:
1. A set of 3D head models (.abs files) and their corresponding 2D RGB registration image (.ppm files), obtained using a Konica-Minolta ‘Vivid 910’ 3D scanner, of real identities (subjects), either Male or Female in gender, and Caucasian or Asian in ethnicity.
2. A set of RGB face images, masked faces without context and background 800x600 in size, of fully synthetic subjects (identities) that do not exist in reality. The synthetic identities are generated by consistent sampling of facial parts from face images of different real identities, sampled from, either Male or Female in gender, and Caucasian or Asian in ethnicity.
Since all the identities in this dataset are synthetic, i.e. they do not exist, they can be used freely without any privacy concerns. These synthetic face images were generated using Python and OpenGL, with minimal training, and can be used as – (1) supplemental training data to train CNNs, (2) additional distractor face images in the gallery for face verification experiments.