VoxEU Column Labour Markets Gender

Women are "hardworking", men are "brilliant": Stereotyping in the economics job market

Academia faces increased scrutiny because of its gender imbalance. This column uses machine learning methods to analyse gendered patterns in the text of reference letters written for candidates for entry-level positions in the economics job market. The findings reveal that women are systematically more likely to be praised for being hardworking and at times less likely to be praised for their ability. Given the time and effort letter writers devote to supporting their students, the authors suggest this gender stereotype is likely due to unconscious biases.

Academia faces increased scrutiny due to gender imbalances (Valian 1999), and this is especially true of economics (Lundberg 2020). Recent empirical work has documented that the career pipeline for women is ‘leaky’, with women dropping out of the profession at critical junctures such as between earning a PhD and becoming an assistant professor, or between assistant and associate professorships (Lundberg and Stearns 2019). In a recent paper (Eberhardt, Facchini, and Rueda 2022), we study the first step in the academic career of an economist – the junior ‘job market’. This is the stage at which the ‘leak’ in the pipeline has grown the most over the past decade (Lundberg and Stearns 2019) and which so far has not received much systematic attention in the literature (Lundberg 2020). 

The academic job market in economics is uniquely structured and centralised. Every autumn, universities post job advertisements and potential applicants prepare a ‘job market package’. This package consists of one or more academic papers, a CV, and a set of reference letters written by scholars familiar with the candidate (henceforth, ‘referees’). In this market, candidates, referees, and hiring committees interact via centralised platforms. The same package is used for most jobs, making the marginal cost of an additional application very low. Reference letters are typically not tailored to any particular institution (Coles et al. 2010). As a result, any institution receiving many applications is likely to observe a sample that is arguably representative of the reference letters in the market. 

A unique dataset of reference letters

We use a unique dataset encompassing all applications for entry-level positions received by a research-intensive university in the UK between 2017 and 2020. Applying natural language processing tools, we analyse the text of over 9,000 reference letters written in support of 2,800 candidates. We find that a standard letter covers a lengthy discussion of the candidate’s job market paper and some mention of their additional research, teaching and other skills. The final section summarises the candidate’s academic abilities and recruitment prospects. We mainly focus on this final section in our research.

After transforming this text into a ‘term frequency-inverse document frequency representation’, we borrow from cognitive psychology and linguistics to quantify whether letters written for female candidates emphasise systematically different attributes from those written for male candidates.


We use two complementary methods to analyse the letters. We start with an unsupervised approach, the LASSO technique, to select the terms that best predict a candidate’s gender.  Among these predictors, we not only observe terms related to research interests but also to personality and traits (“determined”, “diligent”, “hardworking”, etc.). 

We then apply a supervised method by building dictionaries of words that are commonly emphasised in reference letters. These dictionaries were constructed  by existing research on the topic. In particular, Schmader et al. (2007) propose five language categories (‘sentiments’) that are usually present in academic reference letters. These are ability traits, ‘grindstone’ traits, research terms, standout adjectives, and teaching and citizenship terms. We add the recruitment prospects of the candidate as an additional category. 

Ability traits refer to the intellectual capacity of the student, and include terms such as “talent”, “brilliant”, “creative”, and so on. ‘Grindstone’ traits relate, in the words of Trix and Psenka (2003), to “putting one’s shoulder to the grindstone”, and include terms such as “hardworking”, “conscientious”, “diligent”, and so on. Research terms describe the type of research carried out (e.g. “applied economics”, “game theory”, “public economics”, etc.). Standout terms are typically superlatives such as “excellent”, “outstanding”, or “rare”. The teaching and citizenship category refers to  both the candidate’s teaching skills and interactions with colleagues. Language in this group includes “good teacher”, “excellent colleague”, “friendly”, and so on. The last category, recruitment prospects, has been added to identify words that are widely used to describe the expected placement of the candidate in the highly competitive and globalised labour market for young economists. Terms in this group include “highly recommended”, “top department”, “tenure track”, and so on. 

To validate our dictionaries, we carried out an original comprehensive survey of academic economists  in  research-intensive UK universities.  The results of this exercise, reported in Figure 1, indicate a broad consensus between our categorisation and that of the profession. 

Figure 1 Correspondence between authors' categories and ‘wisdom of the crowd’



We estimate the correlation between the importance of each sentiment in the reference letter with the gender of the candidate, controlling for overall letter length and a set of candidate and referee characteristics.  The baseline results are reported in Figure 2, where we plot the estimated coefficients of the female dummy from running a total of 672 different models.  A darker symbol indicates a higher number of specifications that yield statistically significant estimates for the parameter of interest. Full symbols indicate significance at 1% level across all possible clustering. Hollow symbols indicate failure to find statistical significance at any level. 

Figure 2 Baseline regression results



The main message from this figure is that regardless of the institutional ranking, female candidates are significantly more likely to be associated with ‘grindstone’ terms (from 6% to 10% of a standard deviation) across all specifications. These results confirm our interpretation of the unsupervised LASSO analysis. We also observe that fewer terms related to research are used in letters supporting female candidates. Both results echo findings in other disciplines (Trix and Psenka 2003, Valian  2005). The findings appear remarkably stable across different specifications, reassuring that other unobserved confounding determinants are unlikely to change the results.

Sorting of female candidates across institutions or letter writers might play an important role in explaining differences in the language used. For example, Boustan and Langan (2019) document that female representation is a persistent attribute of economics departments, and that it matters to promote women’s careers. To address sorting across departments, we run models including candidate institution fixed effects. The results are reported in Figure 3, and suggest that among students from the same cohort, graduating from the same institution (who, for instance, were admitted to their PhD program with arguably the same entry criteria), women are still significantly more likely to be described with “grindstone” terms.

Sorting across letter writers could still explain our findings. To address this concern, we run a set of specifications including writer fixed effects. Note that we only include referees who have written two or more letters, with at least one for a female candidate. These results are also reported in Figure 3 and are broadly consistent with the patterns above.  

Figure 3 Regressions with fixed effects


In the same figure, we further analyse the sample of referees who have less (more) experience with female candidates separately, i.e. fewer (more) than 50% of their references were for women. The ‘less experienced’ group appears to be the one that drives the ‘grindstone’ result. 

Experience may matter for two main reasons. On the one hand, referees may vary in their perception of women, and female candidates find them to avoid stereotyping. On the other hand, it could be that referees do not differ initially, but that their exposure to female candidates leads them to update prior stereotypes. Further research is needed to disentangle these two mechanisms. 

The fixed effects results also uncover a new pattern with regards to ’ability’. Female candidates are associated with noticeably fewer ability terms, albeit insignificantly, with a clearer pattern emerging in the ’less experienced’ group. In further analysis, we show that this pattern is significant for male letter writers in this group. 

Lessons learned

As academics, we all know how much time is spent writing and polishing reference letters for job market candidates. This is an occasion where we try our best to promote our students. As a result, it is unlikely that, on average, we are willingly undermining female students by emphasising fewer desirable attributes. On a positive note, recent research has shown that unconscious biases can be addressed by providing the actors involved with evidence of the existence of such biases (Boring and Philippe 2021). By documenting gendered language patterns, we hope this research will be a first step towards increasing awareness of our biases and thereby reducing stereotypes in the job markets.


Beaman, L, R Chattopadhyay, E Duflo, R Pande, and P Topalova, (2009), “Powerful women: does exposure reduce bias?”, The Quarterly Journal of Economics 124(4): 1497-1540.

Boring, A and A Philippe (2021), “Reducing discrimination in the field: Evidence from an awareness raising intervention targeting gender biases in student evaluations of teaching”, Journal of Public Economics 193: 104323.

Boustan, L and A Langan (2019), “Variation in Women's Success across PhD Programs in Economics”, Journal of Economic Perspectives 33(1), 23-42.

Coles, P, J Cawley, P B Levine, M Niederle, A E Roth, and J J Siegfried (2010), “The job market for new economists: A market design perspective”, Journal of Economic Perspectives 24(4): 187-206.

De Fraja, G, G Facchini, and J Gathergood (2016), “Professorial salaries and research performance in the 2014 Research Excellence Framework”,, 03 August.

De Fraja, G, G Facchini, and J Gathergood (2019), “Academic salaries and public evaluation of university research: Evidence from the UK Research Excellence Framework”, Economic Policy 34(99): 523-583.

Eberhardt, M, G Facchini, and V Rueda (2022), “Gender Differences in Reference Letters: Evidence from the Economics Job Market”, CEPR Discussion Paper 16960.

Shelly Lundberg, (ed.) (2020), Women in Economics, CEPR Press.

Lundberg, S, and J Stearns (2019), “Women in economics: Stalled progress”, Journal of Economic Perspectives 33(1), 3-22.

Schmader, T, J Whitehead, and V H Wysocki (2007), “A linguistic comparison of letters of recommendation for male and female chemistry and biochemistry job applicants”, Sex Roles 57(7): 509-514.

Trix, F, and C Psenka (2003), “Exploring the color of glass: Letters of recommendation for female and male medical faculty”, Discourse & Society 14(2): 191-220.

Valian, V (1999), Why so slow? The advancement of women, MIT press.

Valian, V (2005), “Beyond gender schemas: Improving the advancement of women in academia”, Hypatia 20(3): 198-213.


1 We have contacted all faculty employed at UK economics departments submitted to the 2014 Research Excellence Framework (REF). For more information, see De Fraja et al. (2016, 2019).

2 Candidate characteristics include ethnicity/race, the year they entered the job market, the RePEc ranking of the PhD-awarding institution, the number of years since PhD graduation, the broad field of specialisation and the publication record – including both the total count of publications and the number of articles published in the Top Five, other top general interest, and top field journals. Letter writer characteristics include gender, RePEc ranking of their institution, and the number of reference letters written for candidates in our sample. 

3 More specifically, we have six different sentiment types, seven different sets of controls, four different types of standard error clustering (robust standard errors, clustered by letter writers, clustered by letter writer’s institution and clustered by candidate Ph.D.’s awarding institution) and four subsamples based on the letter writer institution’s ranking.

4 See Eberhardt et al. (2022) for more details.

3,360 Reads