Measuring Hidden Bias within Face Recognition via Racial Phenotype

Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

Paper Poster Code Video Dataset

Abstract

Recent work reports disparate performance for intersectional racial groups across face recognition tasks: face verification and identification. However, the definition of those racial groups has a significant impact on the underlying findings of such racial bias analysis. Previous studies define these groups based on either demographic information (e.g. African, Asian etc.) or skin tone (e.g. lighter or darker skins). The use of such sensitive or broad group definitions has disadvantages for bias investigation and subsequent counter-bias solutions design. By contrast, this study introduces an alternative racial bias analysis methodology via facial phenotype attributes for face recognition. We use the set of observable characteristics of an individual face where a race-related facial phenotype is hence specific to the human face and correlated to the racial profile of the subject. We propose categorical test cases to investigate the individual influence of those attributes on bias within face recognition tasks. We compare our phenotype-based grouping methodology with previous grouping strategies and show that phenotype-based groupings uncover hidden bias without reliance upon any potentially protected attributes or ill-defined grouping strategies. Furthermore, we contribute corresponding phenotype attribute category labels for two face recognition tasks: RFW for face verification and VGGFace2 (test set) for face identification.

Motivation

Ambiguous Definition of Race: The historical and biological definitions of race vary and racial context is not fixed over time [1].

Privacy of Protected Attributes: Exposing demographic origin with in face recognition studies may identify the representation of a particular group, leading to the potential for racial profiling and associated targeting [2].

Confined Groupings: Skin or racial grouping strategies limits the scope of any study as they fail to capture the whole aspect of the racial bias problem within face recognition where it needs to consider both multi-racial or less stereotypical members of such groups.

Racial Appearance Bias: Studies [3,4] show that individuals with more stereotypical racial appearance suffer poorer outcomes than those with less stereotypical appearance for their race. A better understanding of the role of phenotypic variation complements solutions for both racial and racial appearence bias.

Racial Phenotypes for Face Recognition

We propose using race-related facial (phenotype) characteristics within face recognition to investigate racial bias by categorising representative racial characteristics on the face and exploring the impact of each characteristic phenotype attribute: skin types, eyelid type, nose shape, lips shape, hair colour and hair type.

Facial phenotype attributes and their categorisation.

We annotate the phenotype attributes on each subjects of RFW [5] and VGGFace2 [6] benchmark datasets. For both datasets, we observe that the dominant phenotype attribute categories are Skin Type 2/3, Straight Hair, Narrow Nose, Other (non-monolid) Eyes, Small Lips, which correlates to the dominant presence of Caucasian faces as can be seen on the figure below.

The distribution of facial phenotype attributes of RFW (left) and VGGFace2 Test (right) datasets.

Experimental Results

Face verification, also known as one-to-one verification, is the task of comparing two different facial images to estimate whether they belong to the same individual subject. We follow two pairing strategies to explore the impact of a single attribute (attribute-based) and appearance-based facial groups (subgroup-based) on the evaluation performance of face verification. We also test such pairing strategies on two different training setups; Setup 1 (Imbalanced Training Data:VGGFace2), Setup 2 (Racially Balanced Training Data:BUPT-Balanced).

We pair each attribute category with all other attribute categories to assess cross-attribute pairing performancen - we clearly show that Type 5, Type 6 and monolid eyes pairings have higher false positive matching rates than others.

Cross-attribute based pairings false matching rate, each cell depicts FMR on a logarithmic scale which is log10(FMR) with lower negative values (close to zero) encoding superior false match rates.

We create various subgroups with different phenotypic attribute combinations in the dataset. For example, one such subgroup consists of subjects with skin type 3, monolid eyes, straight hair, wide nose, and small lips. Our main purpose of such pairing is to show the effects of single attribute changes over a group-for instance, what would change when only skin gets darker, but other attributes remain the same? Furthermore, whilst the average accuracy of the subgroups with Type {5,6} skin type is 86.97%, subgroups with Type {1,2} skin type is 92.56%, but this notably includes other attributes effects.

Subgroup-based face verification performance on RFW, sorted by descending order of accuracy. We create various subgroups where each subgroup has same phenotypic attribute combinations.

Getting Started

To start working with this project you will need to take the following steps:

Reproducing experiment results

To reproduce the performance reported in the paper: First, align images to 112x112.

RFW Face Alignment:

python face_alignment.py --dataset_name RFW --data_dir datasets/test/data/African/ --output_dir datasets/test_aligned/African --landmark_file datasets/test/txts/African/African_lmk.txt 

VGGFace Face Alignment:

python face_alignment.py --dataset_name VGGFace2 --data_dir datasets/VGGFace2/ --output_dir datasets/test_aligned/VGGFace2_aligned --landmark_file datasets/VGGFace2/bb_landmark/loose_bb_test.csv

Attribute-based Face Verification:

python face_atribute_verification.py --data_dir datasets/test_aligned/ --model_dir models/setup1_model/model --pair_file test_assets/AttributePairs/setup1/skintype_type1_6000.csv --batch_size 32

Cross Attribute-based Face Verification:

python face_cross_atribute_verification.py --input_predictions test_assets/AttributeCrossPairs/skintype_type2.csv --dist_name 'vgg_dist' --output_path test_assets/AttributeCrossPairs

The distribution of race-relavent phenotype attributes of RFW and VGGFace2 test datasets.

BibTeX

If you are making use of this work in any way (including our pre-trained models or datasets), you must please reference the following articles in any report, publication, presentation, software release or any other associated materials:

@InProceedings{yucermeasuring,
  author = {Yucer, S. and Tektas, F. and Al Moubayed, N. and Breckon, T.P.},
  title = {Measuring Hidden Bias within Face Recognition via Racial Phenotypes},
  booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  year={2022},
  publisher = {IEEE},
  arxiv = {http://arxiv.org/abs/2110.09839},
}

References

  1. Jayne Chong-Soon Lee. Navigating the topology of race,1994
  2. Paul Mozur. One month, 500,000 face scans: How china is using AI to profile a minority.The New York Times, 2019
  3. Keith B Maddox and Jennifer M Perry. Racial appearance bias: Improving evidence-based policies to address racial disparities. Policy Insights from the Behavioral and Brain Sciences, 2018.
  4. Allison L Skinner and Gandalf Nicolas. Looking black or looking back? using phenotype and ancestry to make racial categorizations. Journal of Experimental Social Psychology, 2015
  5. Mei Wang, Weihong Deng, Jiani Hu, Xunqiang Tao, and Yaohai Huang. Racial faces in the wild: Reducing racial bias by information maximization adaptation network. In IEEE International Conference on Computer Vision, 2019.
  6. Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In IEEE International Conference on Automatic Face & Gesture Recognition, 2018