The Census may have survived Covid [Standard Errors], but the Census Bureau is hard at work destroying one of its most valuable products. This is a controversy that is, unlike the Covid-related fights, happening almost entirely out of public view. The issue at hand is the quality of Census block-level data, the detailed data the Census puts out about how many people live in a given (tiny) area, their education, income, etc.
The Census writ large is one of the most important sources of data the government puts out, at least in terms of understanding our country and what’s happening. The block-level data is the most granular, useful information the government produces to help us understand American society. And now the Census Bureau is busily engaged in making it less usable in the name of privacy.
While the costs to this data tampering are clear, it’s not clear that a single American will have their privacy meaningfully improved by this change.
What’s the controversy?
A census block is the smallest geographical unit of Census-level data [GIS Wiki]. Census blocks are quite small, and in an urban area a census block frequently corresponds to an actual block. In rural areas a block is usually some bounded area with e.g., roads or rivers, but it’s more of an art than a science. Blocks are quite small! In a dense urban area they may contain hundreds of people but in a rural area it’s not uncommon to have zero people in a block. They’re frequently rolled up into the more familiar census tract with a few hundred or few thousand people, e.g., this very cool map from the Opportunity Atlas [link] I used earlier this year:
Census blocks are a frequent tool used when someone cares about where people are. They are used to apportion and draw the lines of political districts [Census Bureau], a charged issue at a time when many states are rushing to draw gerrymandered & uncompetitive districts before the next election [The Guardian]. To that end, they’re also a key resource for citizen groups fighting against gerrymandering [Politico]. They’re also used in a lot of mundane governmental applications for which accurate, granular data is important, such as transportation planning [CensusCounts.org].
In the 2020 Census, noise is being introduced in this block-level data. This noise is being inserted in order to keep the data anonymous, which is a statutory requirement for Census publications [NPR]. This requirement used to be simple to fulfill, but increasing computational advances in de-anonymizing of data (and easy access to personal data sold on the internet) have made it harder. So for 2020 the Bureau has rolled out “differential privacy” measures that introduce noise by varying some of the data fields at the Census block level [Census.gov]
The introduction of differential privacy should not affect state population counts, or even really data at the tract level. It should not distort “apportionment”, or the number of Congressional seats assigned to each state. However, it very well might affect the highly geographically-sensitive task of redistricting each state, which is reliant on making very granular decisions. With the recent release of Census redistricting data, the Census maintains the task should be unaffected by differential privacy measures [Census.gov]. But this is hotly disputed by academics and advocates studying voting rights, who found the distortions introduced pose a very real risk of harming minority voting rights [Washington Post].
[Researchers] found that applying differential privacy caused the accuracy of population counts to suffer, particularly in districts that had high racial and ethnic diversity, and cast doubt on the government’s ability to rely on this data to enforce the “one person, one vote” principle that ensures every American an equally weighted vote, said Christopher Kenny, a PhD candidate at Harvard’s Department of Government.1
Whither block-level data?
The practice of differential privacy in the Census block data will not materially improve the privacy of citizens in the United States. There is a thriving industry dedicated entirely to the packaging and sale of your personal data [PrivacyBee]. This will be unaffected by the introduction of differential privacy for Census blocks, as the people most interested in invading your privacy are not getting personal information by clever de-anonymization of Census blocks. Instead they are opening an account with Acxiom, or Experian, or a million other less scrupulous vendors who will very happily share a huge amount of your highly sensitive personally identifying information with stalkers or identity thieves for a nominal fee [The Verge, Consumer Reports].
On the cost side of the ledger, deliberately reducing the quality of information available to the public as well as to the government itself cannot conceivably improve the ability of Americans to understand our society or the government to fulfill its duties. This change is expected to have malign impacts on our ability to plan our cities or to protect voting rights, and will create room for doubt in the data that will be wielded as a weapon by those most opposed to fair elections. Against these costs, it’s hard to see how anyone’s privacy will benefit in a manner meaningful to them.
Other researchers quoted in the Post dispute this claim. The Census claims that tweaks made since April should resolve this concern, though this has yet to be put to the test.