Racial Bias within Face Recognition: A Survey

Seyma Yucer, Furkan Tektas, Noura Al Moubayed, Toby P. Breckon

Paper

Abstract

Facial recognition is one of the most academically studied and industrially developed areas within computer vision where we readily find associated applications deployed globally. This widespread adoption has uncovered significant performance variation across subjects of different racial profiles leading to focused research attention on racial bias within face recognition spanning both current causation and future potential solutions. In support, this study provides an extensive taxonomic review of research on racial bias within face recognition exploring every aspect and stage of the face recognition processing pipeline. Firstly, we discuss the problem definition of racial bias, starting with race definition, grouping strategies, and the societal implications of using race or race-related groupings. Secondly, we divide the common face recognition processing pipeline into four stages: image acquisition, face localisation, face representation, face verification and identification, and review the relevant corresponding literature associated with each stage. The overall aim is to provide comprehensive coverage of the racial bias problem with respect to each and every stage of the face recognition processing pipeline whilst also highlighting the potential pitfalls and limitations of contemporary mitigation strategies that need to be considered within future research endeavours or commercial applications alike.

Introduction

Over several decades, the objective of developing face recognition systems has gathered significant pace across research, and industry alike [3, 35, 200]. Companies, nonprofits, and governments have deployed an increasing number of face recognition systems to make autonomous decisions for millions of users [90] across various application areas, such as within employment decisions, public security, criminal justice, law enforcement surveillance, airport passenger screening, and credit reporting [4, 97]. However, such wide-scale adoption within real-world scenarios heightens public concern about their potential for abuse and the adverse effect of face recognition may have on some individuals due to the presence of bias [45, 180]. The most prevalent problem pertaining to such bias arises within the race and race-related groupings and is referred to as racial bias within face recognition [53].

However, the presence of racial bias within face recognition is not a new thing and is not in itself limited to technological means. Own-race bias has been previously established in psychology [116] by showing that humans are less capable of recognising faces from other races than their own. The prolonged societal experience humans generally have with their own-race, especially during their formative years with biological family members, results in biased human perceptual expertise. More specifically, [67] showed how the use of face feature descriptors varies across participants from different racial groupings. For example, it shows that darker skin tone participants use face outline, eye size, eyebrows, chin and ears, while lighter skin tone participants use hair colour, texture, and eye colour. Overall, it concludes that lighter skin tone participants use less varied descriptors than darker skin tone participants [67]. Similar to the own-race bias, the conversely named other-race effect is also studied by a series of studies in social psychology [5, 157] to establish social implications of biased face processing and feature selection of humans in erroneous jury decisions, eyewitness identification.

Accordingly, the first technological study [144] to explore the other-race effect within the context of face recognition algorithms was developed by East Asian and Western-based research groups that inherently use datasets gathered locally. The study demonstrates that algorithms trained on a locally gathered face datasets from the Western-based group achieve superior performance on Caucasian faces when compared to performance on East Asian faces, and vice versa. Further studies provide extensive evidence about the influence of demographics, including race, gender and age on both commercial and non-commercial face recognition algorithm performance [86, 141]. Subsequently, the Gender Shades study [13] drew significant attention to gender and skin tone bias within commercial algorithms for gender classification by revealing a 34\% performance discrepancy between darker skin tone female and lighter skin tone male subjects. Consequently, growing research has emerged to understand and mitigate racial bias within face recognition [96 , 117, 216]. These efforts and associated evidence of bias have forced several commercial and academical research to withdraw products, algorithms, or datasets due to the differing forms of disparities, distortions or biases [17, 118, 173].

However, face recognition remains a long-standing research topic and a common use case within computer vision that comprises multiple stages of processing, a multitude of downstream tasks and large-scale face recognition datasets in order to achieve high accuracy. With the availability of such large-scale data resources and the advent of Deep Convolutional Neural Networks (DCNN), the accuracy of face recognition algorithms has now excelled the perceived accuracy requirements for use by the general populous. However, every stage of face recognition, from initial face image acquisition to final performance evaluation, requires attention and investigation to address racial bias, which may otherwise result in disparate outcomes across a diverse user population. Unfortunately, despite the increasing attention to racial bias within face recognition, we are yet to see truly collaborative or tractable solutions emerge from the global research base that could readily address these issues in real-world system deployments [47 , 175 , 201, 218]. Moreover, face data itself is a private biometric capable of identifying a given individual based on their appearance alone, giving rise to obvious operational privacy and ethical concerns in relation to its processing [25]. Although previous surveys on algorithmic bias and fairness in machine learning [34 , 115, 140] and face recognition in computer vision and biometrics [90, 200] exist, many aspects remain under-studied in relation to the specifics of racial bias within face recognition.

On the other hand, face recognition is a fast emerging field of research and applications alike that spans multiple more traditional fields, including machine learning, biometrics, statistics, sociology, and psychology. Therefore, we commonly find that aspects of the problem definition, in addition to the race conceptualisation and race-related performance evaluation methodologies, need to be clarified and ideally standardised. Which stages, operations and decisions in face recognition are prone to bias, and how incorrect solutions to addressing the bias issue can cause additional areas of concern that need to be highlighted in order to maximise the effectiveness of future research in this area. In this survey, we take face recognition as the central concept of our review and aim to provide coverage of all the aspects of racial bias within each stage of the face recognition processing pipeline, with additional supporting material spanning fundamental concepts from related fields.

The primary purpose of this study is to both summarise the current state of the art and to give a comprehensive critical review of prior research on the topic of racial bias within face recognition. In addition, we aim to make the reader pertinently aware as to the subtleties, and potential areas of ambiguity, with regard to how the racial bias problem within face recognition itself is defined. Furthermore we aim to identify which parts of the problem have been studied effectively to date and which directions remain open for future contributions to mitigate racial bias within the face recognition domain. In particular, the survey aims to systematically review each of the stages that are commonplace within contemporary face recognition processing pipelines from a perspective of the potential for racial bias impact: image acquisition (for both dataset collation and deployment), face localisation, face representation, face verification and identification (final decision-making).

On this basis, we present this survey based on our taxonomy of prior work in the field and its contribution to the current state of the art (Figure 1). Subsequently, we formalise the problem definition with the corresponding evaluation and fairness criteria (Section 2). Next, we discuss standard race and race-related grouping terminology under three categories; race, skin tone and facial phenotypes (Section 3). This discussion provides an information spectrum from grouping definitions to their adoption to the associated processing of racial groupings used in literature studies. Consequently, we provide a general development schema for face recognition systems and summarise the prior work in the field by aligning it to each development stage (Section 4). Within this section (Section 4), we firstly give an outline description of the general face recognition processing pipeline using consistent notions and symbols. Secondly, we cover image and dataset acquisition processes for face recognition showing the risks and investigations within this stage. Thirdly, we extend our analysis to face localisation as it is a mandatory stage where the possible biased localisation results propagate within the following face recognition stages. Penultimately, in the face representation stage, we categorise the proposed racial bias mitigation approaches based on machine learning techniques. Finally, we cover face identification and verification tasks and show the impact of the methodological decisions effects on racial bias.Consequently, we summarise the main critical points of the work and highlight the essential steps that need to be considered within any future research endeavours or commercial applications that aim to mitigate bias or develop fairer face recognition systems (Section 5).

Conclusion

We provide a comprehensive critical review of research on racial bias within face recognition. Firstly, we discuss the racial bias problem definition formalising the notions of the face recognition evaluation process and elucidate the prominent fairness criteria associated with face recognition. Subsequently, we highlight the racial grouping requirement of current fairness criteria and discuss standard race and race-related grouping terminology under three categories; race, skin tone and facial phenotypes and compare the most prominent grouping strategies across face recognition datasets. The high reliance of prior work on racial categories brings additional challenges as the race concept is defined and understood via the influence of pre-existing prejudices and discriminatory beliefs. Furthermore, skin tone remains only one trait of a comprehensive and multi-faceted race concept. Although a broader facial phenotype approach provides a more objective and granular evaluation strategy, ensuring that racial interpretations are not reduced to only facial phenotypes whilst also considering the broader context of historical and social factors, they remain important and under-explored research topics within the broader goal of achieving more accurate and fairer face recognition performance across increasingly more diverse populations.

Furthermore, we explore the contemporary automated facial recognition multiple-stage processing pipeline providing references to related work in the literature. In each stage, we cover the outline with a related baseline, standard procedures, a potential source of bias that can exacerbate racial bias and bias mitigation solutions. Firstly, the Image acquisition stage consists of sources of bias (imagery bias, dataset bias) that can affect the accuracy and fairness of face recognition systems. Such sources of bias within this initial stage will be transferred into the following stages and amplify racial bias in the final performance. Secondly, we consider the face localisation stage in terms of racial bias, where there is little attention indicating the existence of racially disparate performance, but further investigation is explicitly needed targeting racial bias within face detection itself. Thirdly, we review the most fundamental works spanning the central stage of the face recognition pipeline, face representation, under three sub-genres:- mutual information mitigation, loss function-based mitigation, and domain adaptation-based mitigation, providing an extensive supporting performance comparison across the RFW dataset. Finally, we investigate the final decision-making of the face recognition pipeline, face verification and identification and reveal the impact of decision-making within this stage on overall and group-wise face recognition performance.

Overall we observe that racial bias is present at each and every technical stage of the face recognition pipeline such that the cumulative effect remains under-explored mainly in the literature. Furthermore, we observe continued bias within the evaluation strategies employed to measure the presence of this bias themselves that directly contradict the technological needs of a modern, diverse global society.