Making Sense of the Census

Hindsight is 2020

May 07, 2021

I regret to inform you that this newsletter was a day delayed due to yesterday’s Substack outage. I trust that you made it safely through an extra day without some data commentary.

The Census results were released recently, and the results were a bit surprising. The results were unusually noisy, in that there was a much bigger gap between expectations of Census results and where they actually came in. A bigger surprise, in my opinion, is that despite a lot of the noise over the Trump Administration’s changes that might undercount vulnerable populations it does appear to have gone that way. In fact, the Census count came in well above expectations, not below.

The reasons why are confusing, but suggest that it may just be getting harder to determine the ground truth of how many Americans there are.

The Controversy

The 2020 Census was mired in political controversy from the start. This was not unexpected, as the Census is the basis for determining states’ political representation in the US House of Representatives as well as funding for many federal programs. It’s also overseen by political appointees rather than civil servants (though obviously most Census staff are civil servants). This means that the Census is frequently charged with high partisan stakes in a manner that would be alien to most other developed countries.

The first and most salient controversy was an attempt by then-Commerce Secretary Wilbur Ross to add a citizenship question to the Census. The controversy was due what appeared to be the pretextual nature of the question: the government, after all, already knows which residents are citizens. Opponents of the change instead charged that the point of the change was to discourage immigrants from taking the Census at all [Vox] and to assign more weight to jurisdictions with fewer immigrants. The Supreme Court agreed, finding that the stated rationale for the change appeared to be mere pretext (backed up by internal administration documents) [Reuters].

Things did not get easier as 2020 arrived, bringing with it the coronavirus crisis. This had two dimensions - true external crisis and self-induced paralysis. The external crisis was easy enough to ascertain - challenges in hiring Census takers to go door-to-door and unwillingness of citizens to answer a knock. The self-induced paralysis was an initiative of the Trump Administration to wind up Census efforts as soon as possible regardless of whether the Bureau wanted more time to collect data [CNN]. Opponents were extremely suspicious that the reasons given were mere pretext with more odious motives, which made some sense given the Supreme Court ruling that the previous big Census change was exactly that.

But in the end, eventually the process wrapped and the counts were finalized, and the data released in early 2021, which brings us to today.

Expected and Unexpected Results

As expected, the results do alter the national political scene somewhat. Many of these changes are largely expected - as the nation’s demographic center of gravity has moved south and west, the northeast has lost Congressional seats to the South and West [CNN]. One minor surprise is that California, for the first time ever, lost a seat - while California is still growing, the housing shortage and out-of-control housing costs have caused it to relatively decline. Another more eye-opening fact is the extremely thin margins in some of these decisions: Minnesota was a mere 26 people away from losing a House seat, and New York lost one due to being short by 89 people [NPR].

However, there were also some big surprises from a data quality perspective. In between the decennial Census, the Census Bureau operates the annual American Community Survey (ACS). The ACS tracking means that we generally have a pretty good idea of how many people live in the US, and any particular jurisdiction - but this year’s result was surprisingly far off [Washington Post]. The Census didn’t under-count relative to the ACS, but over-counted - it produced a national population 0.6% higher than estimated, a much bigger gap than the 0.1% in 2010.

These over/under-counts don’t appear systematic, and don’t really match up with pre-Census concerns about undercounting immigrants. Arizona, with a large immigrant and Hispanic population appeared to be under-counted, but New York with its own large immigrant population appeared to be over-counted. Also, just to be clear, these colors aren’t about actual growth - Arizona is growing much faster than New York - but about how far the counts varied from pre-Census estimates.

The simplest explanation is probably just to shrug and ascribe it to the ever-growing list of Weird Covid-Related Datasets.

Truth is Elusive

However, it’s worth pausing here to note that we don’t really have any idea whether the estimate or the count is actually correct. The dispute over the citizenship question is a small skirmish in a wider war about Census methodology - whether the Census should be a strict count of heads or an estimate informed by data. Specifically, there are a lot of known accuracy issues with doing a simple count and there’s a high tendency to undercount vulnerable populations, so statisticians prefer using sampling to get a better estimate. Sampling is what the American Community Survey does [Census.gov].

The Supreme Court has ruled sampling for the Census unconstitutional (the Constitution mandates a count), but it hasn’t killed the controversy [Science Clarified]. The Supreme Court ruled along partisan lines, and there are obvious partisan stakes. Democrats are more supportive of sampling because they suspect a simple count will underrepresent both people and geographies that lean Democratic, and Republicans oppose it for the same reason. Demographers and statisticians tend to be pretty firmly on the side of sampling, as it should theoretically be much less biased.

Here’s the fun part, and the reason why the 2020 Census means the takes never have to stop: what if sampling is just getting less accurate? I’ve written about this in the context of political polling [Standard Errors], but there’s perfectly good reason to believe the same low response rates that plague political polling may also poll demographic research. In that context, we should expect increasingly large and confusing divergences between the Census and ACS as the sampling methodology develops its own biases alongside the known Census biases. In short, we may just be moving further and further away from actually knowing how many people are living in the United States.

It seems that on the Census as with so many other things, the question of which assumptions and framework you’re using becomes the central question around which all other data questions resolve.

Standard Errors

Making Sense of the Census

Hindsight is 2020

Discussion about this post