Testing Isn't The Source of Bias in College Admissions
What if the real bias was availability bias all along
A policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is." [Wikipedia]
In the past few years there has been a growing trend towards rejecting the use of standardized testing in college admissions. A common theme is that standardized testing is biased towards higher-income and white applicants to the detriment of applicants of color. This has been accelerated by the shifts in testing availability during the Covid-19 pandemic - as in-person tests have been impractical to administer, many university systems have dropped testing requirements [Higher Ed Dive] while proudly proclaiming their decision as furthering social justice.
However, this leaves unasked the question: will removing standardized testing make applications less dependent on socioeconomic privilege…or more so? A new working paper suggests that application essays are even more highly correlated with socioeconomic bias. It appears that much like the drunk searching for his keys, attention has focused on standardized testing because that’s where the data is. Removing allegedly “biased” standardized testing score looks likely to further bias applications towards wealthier and whiter applicants.
The Intuition
A common phenomenon in social policy is that the less transparent a benefit is to access, the more unequal it is in practice. There are explicit hurdles to accessing benefits such as means-testing but there are also implicit barriers such as requiring extensive paperwork, frequent requests for renewals, or the simplest one of just requiring people to apply. The higher the implicit barriers to receiving benefits, the more likely those barriers are to keep out less educated and less wealthy applicants [Oxford University Press]. To put it concretely: when the government sends checks to everyone who has a baby, everyone gets a check. When the government guarantees everyone a check but requires an application with the past ten years of residence and earnings history, not everyone will get a check! And those who don’t will be less educated, less wealthy, and less white than those who do.
It is close to an iron law of benefits that the more layers of discretion and proactivity are required of a recipient, the less equitably it will end up being distributed.
College admissions should follow this same pattern. Standardized tests are a relatively straightforward requirement: take the test and try to do well. Test prep appears to be of limited efficacy [ResearchGate] but everyone can buy a test prep book or access free resources online. The expectations to succeed - and thus access the benefit - are crystal clear. But the expectations of a good college essay are far less clear and nod to implicit expectations. Unlike a standardized test, parents can and do not just help but write an application essay for their children. The parents who know to do this, and know how to do it well, are more likely to be the already-privileged.
The Evidence
This working paper from a group at Stanford avoids the well-trodden ground of bias in standardized testing and tackles the much thornier problem of bias in application essays. The data set is really cool: 240,000 anonymized essays from 60,000 applicants to the University of California system. They were able to match these against both the applicants’ SAT scores and their household income.
They applied what is called “correlated topic modeling” which attempts to organically extract coherent “topics” from a large body of text. Each topic might consist of e.g., a group of words and phrases that frequently go together. To give you an example, if you fed years of the local paper’s sports section into a topic model, you might extract one topic with “yards”, “down”, and “field goal” and another with “innings”, “batter”, and “pitcher”. Topic modeling only relies on data internal to the documents studied.
They also applied another approach called LIWC which compares it to external data sets. In LIWC the topics are already defined, so it attempts to see how well the documents studied match up to these external topics. So this is more likely to generate insights like, say “total punctuation…and longer words…were positively associated with SAT…and [household income] followed a similar pattern” (p. 5). The upshot is, basically, what one might expect: there’s a stronger relationship between essay content and household income (green boxes) than there is between income and SAT score (red box).
The conclusion
I love this paper, and in my opinion it’s one of the better applications I’ve seen of “data science” to a thorny social problem. The data set is super cool. It’s a close to purely descriptive approach, where they are applying data science tools to describe what’s happening rather than advancing causal arguments. And it is an attempt to actually put numbers or evidence against a type of claim that in the pre-data science era would have been extremely difficult or even impossible to rigorously evaluate.
And it sheds a somewhat dubious light on the nature of the problem, and the nature of the solution. By making essay content “legible” and measurable, it suggests that this standard is even less equitable than the standardized tests essays might replace. Much like the drunk searching for his keys below the streetlight, it’s easy to imagine that standardized tests have become bugaboos for activists simply because they are so visible. This research suggests that perhaps there is simply a basic contradiction between improving social equity and using high-stakes admissions at selective institutions as a gatekeeper for social advancement.