Scroll
A re-reading of The Economist's Tracking the Stealthy Killer

A map that already lied.

In early March 2020, the world's COVID-19 case map was less a record of where the virus was than a record of where countries were looking. Tourism flows showed exactly who was looking the wrong way.

01 · The hook

A map that already lied

On 4 March 2020, the Johns Hopkins dashboard counted 95,124 COVID-19 cases worldwide. China held 80,271 of them. Hubei province by itself held 67,332 — more than 70% of the global total in a single Chinese province. Outside China, only three countries had crossed 2,000 cases: South Korea, Italy and Iran.

Two days later, The Economist published a Graphic Detail piece arguing this map was already wrong. Not the China numbers, which dwarfed everything. The numbers everywhere else — the apparently quiet places where governments were still saying "we have a few imported cases, things are under control."

What the piece had was an unusual yardstick: how many Chinese tourists each country had received the previous summer.

Reported COVID-19 cases on 4 March 2020 — top 15 countries
China held 84% of the global total. Outside it, almost nothing.
Source: Johns Hopkins University CSSE dashboard, 4 Mar 2020. Log scale on x.
An empty international departures terminal in early March 2020.
Early March 2020. The numbers were small. The airports were already emptying.

02 · The yardstick

The yardstick: Chinese tour groups

The Economist's data came from a slightly improbable place. China's Ministry of Culture and Tourism tracks tour-group travellers — both directions — for the top 30 destination/origin countries plus continent residuals. It was the most recent globally consistent measure of who was moving between China and the rest of the world: Q3 2019, the latest quarter available when the analysis ran.

The flows are wildly uneven. Thailand alone saw 1.55 million Chinese tour-group trips in a single quarter — almost twice Japan's 1.25 million. Taiwan, Vietnam, Singapore, Malaysia and Russia each handled between 400,000 and 950,000. The top ten destinations accounted for 74% of all flow.

The intuition was simple. If the virus moves with travellers, then the more travellers a country had received, the more cases it should have — controlling, roughly, for how hard each country was looking.

Chinese tour-group flows, Q3 2019 — top 15 destinations
OECD bars in warm tone, non-OECD muted. The largest exposures sit outside the OECD.
Source: China Ministry of Culture and Tourism, Q3 2019 (mean of inbound and outbound).

03 · The model

The OECD line

Fit a single line through the 34 OECD countries and you get a remarkably clean answer. log(cases+1) = -8.44 + 1.13 × log(tourism). The slope is highly significant (p < 1e-7) and the model explains 59% of the variance in log-cases across OECD members.

The choice to fit only the OECD was deliberate. OECD countries had broadly similar testing infrastructure in early March 2020 — what they reported was, more or less, what they actually saw. Project that line outward to non-OECD countries and you get an "expected" caseload: how many cases each country would have if its surveillance worked like an OECD country's. The residual — distance from the line — becomes a surveillance gap.

The OECD-only fit, drawn through 124 countries
log(cases+1) = -8.44 + 1.13 × log(tourism). R² = 0.59. Fit on OECD only; non-OECD shown for context.
Source: JHU CSSE (cases) + China MoCT (tourism). OLS fit over 34 OECD countries.

04 · The reveal

Below the line

Plotted on a single chart, the answer is immediate. OECD countries cluster around the line. Non-OECD countries scatter widely — and a striking number of them sit far below.

Russia is the most extreme. With 434,000 mean Chinese tour-group flows, the model expected 517 cases on 4 March. The country reported three. Indonesia: 330 expected, two reported. Myanmar: 110 expected, zero. The Philippines, Vietnam and Thailand each reported between 50 and 70 times fewer cases than the OECD-fit line implies.

These were not obscure countries with tiny tourism. Thailand was the single largest destination of Chinese tour groups in Q3 2019. Vietnam and Indonesia were both in the global top ten. The places best positioned to import early COVID-19 were the places reporting the fewest cases.

Tourism vs. reported cases — hover any country to see its residual
Each dot is one country. The line is the OECD-only OLS fit. Hover to read country, tourism, reported, predicted, multiplier.
OECD (n=34) Non-OECD (n=90) OECD-only fit line
Tip: outliers are pre-labelled. Hover any unlabelled point for details.
All 124 countries. Both axes log; cases axis offset by +1 for log.

05 · The other end

Above the line

At the opposite end of the residual list are countries whose case counts are running ahead of their tourism. Iran is the most extreme: 2,922 reported cases against a tourism-implied 12. Italy reports 3,089 against 156. South Korea: 5,621 against 392. These are not surveillance failures — they are countries where the virus had already gone domestic, where local transmission had outpaced anything imports alone could explain.

By this point the model is doing two things at once. Below the line: a country where the virus is probably present but invisible to the surveillance system. Above the line: a country where the surveillance system is working but the outbreak has accelerated past what tourism alone seeded.

15 countries reporting more cases than tourism alone would predict
Residual measured in log-units above the OECD-fit line. Iran is in a class of its own.
Positive residuals = above the line = local transmission outpacing imports.

06 · The asymmetry

A surveillance gap, by group

Step back from individual countries and the systematic gap is unmistakable. The OECD residuals are mean-zero by construction. The 90 non-OECD countries, scored against that same line, average -0.85 log-units below it — equivalent to reporting 43% of what the OECD-pattern would predict. 64% of non-OECD countries sit below the line, against 56% of OECD members.

Pour through the residuals and the diagnosis becomes hard to ignore: the early-March case map is shaped less by where the virus is than by who is testing for it.

Distribution of residuals — OECD vs non-OECD
Two overlapping histograms of every country's distance from the fit line.
Bin width: 0.5 log-units. Vertical lines mark each group's median.

07 · The multiplier

What the model says is missing

Read the model literally and the implied under-counts are vast. Russia: 172x. Indonesia: 165x. Myanmar: 110x. Philippines: 69x. Vietnam: 54x. Thailand: 51x. These are not absolute predictions of the truth — but they are estimates of how far the reported numbers are from the tourism-implied baseline.

172× Russia
predicted vs reported
165× Indonesia
predicted vs reported
110× Myanmar
predicted vs reported
69× Philippines
predicted vs reported
54× Vietnam
predicted vs reported
51× Thailand
predicted vs reported

Subsequent peer-reviewed work would land in roughly the same place. A Pulmonology paper using case-fatality ratios estimated that Iran's true caseload was about 34 times its reported total in mid-March, Italy's 73 times, Spain's 161 times. A Science paper estimated 86% of pre-23-January infections in China had gone undocumented. The tourism model, fit on a single morning's data, was pointing in the same direction these much-more-elaborate methods would later confirm.

In Indonesia, community transmission would not be officially acknowledged until late March. Iran's hospitals were already overwhelmed when its case count was being reported in the hundreds. The Economist's piece, days before either of those facts became visible, said: look at the residuals.

Implied under-detection multipliers — top 12
Predicted ÷ reported, log scale. Read as "the model implies this country was reporting one in N of its tourism-baseline cases."
Compare to subsequent peer-reviewed estimates: Iran 34×, Italy 73×, Spain 161× (Lau et al. 2020, Pulmonology).
An empty city street in early March 2020 with closed shop shutters and a single masked pedestrian in the distance.
Late February / early March 2020. While Russia, Indonesia and the Philippines were still reporting handfuls of cases, streets like this were emptying out city by city. Editorial illustration.

08 · The close

Reading the gaps

What was new about the piece was not the regression. It was the framing. Most early-March 2020 reporting treated case counts as a thermometer: a higher number meant a worse outbreak. The Economist's argument was that the thermometer itself was uneven — some countries were holding it under their tongue and others were leaving it in their pocket — and that tourism flows could tell you which was which.

Read forwards, the residuals were a bet on what would happen next. The countries furthest below the line, the model implied, were the countries whose first wave was still hidden. Most of them would be in the news within weeks.

When you cannot measure a thing directly, find a proxy for what should be there, fit it on the part of the world that measures well, and read the gap.

The lasting lesson is not the specific multipliers — those are noisy, single-quarter, single-day estimates. It is the move itself: when you cannot measure a thing directly, find a proxy for what should be there, fit it on the part of the world that measures well, and read the gap.

Caveats. Tour-group flows are a proxy: not all travellers move in groups, and the data are six months old. The OECD-only fit projects an "expected" caseload onto countries with very different surveillance, healthcare and demographics — a residual is not proof of under-reporting. All confirmed-case figures are reported numbers, themselves subject to lag and reporting practice on 4 March 2020.
Content
Assets