Can AI hiring tools be fair if they treat equal candidates differently?
Language models used in hiring decisions are not simply reproducing old patterns of workplace discrimination, but reshaping them in new and uneven ways. A large-scale audit found that AI systems gave hiring advantages to female and Black candidates when qualifications were held constant, while disabled candidates continued to face a hiring penalty.
The study, titled AI Alignment Amplifies the Role of Race, Gender, and Disability in Hiring Decisions, was published as an arXiv preprint. It tested 27 language models across 177 occupations covering nearly half of U.S. employment and found that post-training alignment, the process used to make models more helpful and policy-compliant, was the main factor amplifying demographic effects in AI hiring decisions.
AI models reverse some human hiring biases but create an uneven pattern
AI systems are increasingly being used to screen job candidates, rank resumes and support hiring decisions, placing language models closer to decisions that can shape wages, career mobility and access to work. The new study adds evidence that these systems do not treat demographic information as neutral, even when candidates have comparable qualifications.
The researchers tested whether language models used race, gender and disability status when asked to choose between two job candidates. The experiment covered a wide slice of the U.S. labor market, drawing from public occupational data on education, experience, wages, skills and demographic participation. Each model was asked to make pairwise hiring decisions across occupations, with candidate qualifications controlled and demographic information explicitly included.
Across instruction-tuned models, female candidates received a statistically significant hiring advantage over otherwise comparable male candidates. Black candidates also received a hiring advantage over otherwise comparable white candidates. Disabled candidates, however, received a statistically significant disadvantage compared with non-disabled candidates.
In odds terms, female candidates had about 12 percent higher hiring odds and Black candidates had about 8 percent higher hiring odds in instruction-tuned models. The researchers calculated that the gender effect was comparable to roughly one year of additional education, while the race effect was similar to about seven months of additional education. Disabled candidates faced about 6 percent lower hiring odds.
The findings challenge the common assumption that AI hiring systems will mainly duplicate historical discrimination documented among human employers. In previous human correspondence studies, Black applicants have typically received fewer positive responses than white applicants with similar credentials, and disabled applicants have faced substantial penalties. In this AI audit, the racial pattern moved in the opposite direction, while the disability penalty remained negative but smaller than the penalty found in human hiring studies.
Gender showed a different pattern again. Human hiring correspondence studies have found a slight female advantage in some settings. The language models amplified that advantage. The study found that AI systems gave female candidates a larger advantage than the benchmark found in human hiring research.
The study primarily warns that AI models may produce a different discrimination map, not an absence of discrimination. The direction and size of demographic effects varied across race, gender and disability, with disability standing out as the group for which alignment produced the least favorable outcome.
Regulators and employers should not focus only on whether AI repeats historical bias, the authors stress. They should also examine how model training and alignment may create new forms of unequal treatment across groups.
Alignment sharply increases the role of demographics in hiring decisions
Post-training alignment appears to be the main driver of these demographic effects. The researchers compared instruction-tuned models with matched pre-trained models that shared the same architecture but differed in alignment status. This design allowed them to isolate the effect of alignment more directly.
Compared with matched pre-trained models, alignment amplified the hiring advantage for female candidates by 325 percent and for Black candidates by 330 percent. It also worsened the disability penalty by 171 percent, making alignment a major issue in the study. Alignment is generally intended to make AI systems safer, more helpful and more consistent with human instructions. In hiring decisions, however, the study found that it also made models more likely to use demographic information in ways that changed candidate outcomes.
Pre-trained models showed smaller demographic effects and were close to random in their responsiveness to qualifications. Instruction-tuned models, by contrast, became far more responsive to candidate qualifications, including education, work experience, GPA, technical skills and general skills. The researchers found that alignment increased qualification weights by 218 percent to 444 percent, transforming base models into systems that appeared much more capable of ranking candidates on job-related information.
However, that same improvement came with stronger demographic effects. The models did not just become better at reading resumes, but also became more likely to assign different value to candidates depending on demographic identity. This creates a policy dilemma. Employers want AI systems that respond strongly to qualifications, not random or weakly reasoned systems. The research still suggests that the training processes that make models more useful may also amplify demographic preferences and penalties.
Further, the study found that demographic effects were mostly additive rather than intersectional. When the researchers tested whether being part of multiple marginalized groups created additional compounding effects beyond the sum of individual demographic effects, they did not find significant interactions. This suggests that the effects of race, gender and disability largely operated separately in the tested models.
The findings varied across occupations, but not in a way that erased the overall pattern. The female and Black candidate advantages were larger in higher-wage occupations among instruction-tuned models. The disability penalty was more pronounced in occupations where disabled workers were more prevalent. Beyond those patterns, the demographic effects were broadly present across the occupational landscape.
The use of 177 occupations gives the study a broad labor-market scope. The researchers did not test a narrow set of jobs or a small number of model prompts. They evaluated millions of model decisions across occupations, model families and candidate profiles. Each model made 70,800 pairwise hiring decisions, and the instruction-tuned sample included more than 1.3 million decisions.
The study included models from several major families, including Llama, Qwen, OpenAI GPT, Google Gemini and OLMo. The authors tested both instruction-tuned and pre-trained systems where available, allowing them to compare aligned and base versions across matched architectures. This clearly suggests that the pattern is not confined to one model developer or one job category. The study points instead to a broader consequence of alignment and instruction tuning in hiring-like tasks.
Disability penalty exposes the limits of AI fairness fixes
While female and Black candidates gained advantages in instruction-tuned models, disabled candidates still faced a penalty. Alignment did not remove that disadvantage. It amplified it.
The researchers examined possible mechanisms behind the uneven results and found evidence consistent with statistical discrimination. In this context, statistical discrimination means models appear to use demographic information as a signal when evaluating productivity-related information, especially when candidate information is incomplete or noisy.
The models rewarded the same qualifications more generously when they were held by marginalized-group candidates. General skills produced higher returns for female, Black and disabled candidates. Work experience generated an additional premium for female and Black candidates, but not for disabled candidates. This difference helped explain why disability outcomes diverged from gender and race outcomes.
The models also penalized marginalized candidates more when qualification signals were absent. When work experience was missing, models had less information from which to infer skills. In those cases, instruction-tuned models imposed extra penalties on female and disabled candidates in the pooled analysis. In the matched alignment comparison, alignment increased information-asymmetry penalties for female, Black and disabled candidates, with the largest penalty falling on disabled candidates.
Many real hiring processes involve incomplete resumes. Early-career applicants, career changers, workers with employment gaps and people who faced structural barriers may have less conventional work histories. If AI models punish missing information more severely for some groups, hiring systems could create hidden disadvantages even without openly rejecting candidates based on identity.
The study suggests that disability is especially vulnerable to this problem. Disabled candidates did receive higher returns to general skills, but they did not receive the same alignment-induced work-experience benefit as female and Black candidates. They also faced the strongest penalty when qualification information was absent. Together, those patterns may explain why the overall AI effect was negative for disabled applicants.
The findings have immediate relevance for employers, AI vendors and regulators. Automated hiring tools are already subject to growing scrutiny, including under laws such as the EU AI Act and New York City’s Local Law 144 on automated employment decision tools. Much of that scrutiny has focused on whether AI systems reproduce historical patterns of discrimination. This study suggests that audits must go further.
A model can reduce one form of historical bias while amplifying another. It can favor one marginalized group while penalizing another. It can appear more qualification-sensitive after alignment while also becoming more demographic-sensitive. These patterns are hard to detect without structured audits that test race, gender, disability and other protected attributes separately and together.
The paper does not argue that human hiring is fair or that AI hiring should be avoided altogether. Instead, it shows that AI systems can depart from human discrimination in complicated ways. In race, the models reversed the direction of the human benchmark. In disability, they reduced but did not eliminate the negative penalty. In gender, they amplified an existing female advantage.
Coming to the limitations, the study used simulated resumes and pairwise hiring decisions rather than real employer decisions. The candidate profiles included explicit demographic information, which may not appear in the same way in every real hiring context. The models were accessed under default settings, and the results may differ under different prompts, safeguards or deployment rules. The research is also a preprint, meaning it has not completed formal peer review.
- FIRST PUBLISHED IN:
- Devdiscourse

