Thesis Project

Outbreak trackers show what is happening. This dashboard shows whether the local health system can handle it.

Global Health Certificate of Distinction, University of Arizona College of Medicine - Phoenix

633

Outbreak records

270

Countries tracked

Health indicators

12h

Refresh cycle

Operating cost

When a disease outbreak hits, two questions matter: what is happening, and can the local health system handle it?

Outbreak trackers answer the first question. The WHO publishes Disease Outbreak News. HealthMap aggregates media reports. ProMED circulates expert alerts. These tools tell you that there is a cholera outbreak in Country X with Y confirmed cases.

Health system assessments answer the second question. The Global Health Security Index scores national preparedness. The INFORM Risk Index quantifies vulnerability. The WHO's IHR SPAR framework tracks core surveillance and response capacities. These tools tell you how many hospital beds Country X has per capita, what its physician density looks like, whether its immunization coverage is above or below the WHO threshold.

The problem is that no public tool answers both questions at the same time. When an analyst needs to assess whether a Marburg outbreak in East Africa is a regional emergency or a contained event, they open the WHO DON page in one tab, the GHSI database in another, the World Bank health expenditure data in a third, and cross-reference manually. This is the workflow for every outbreak, every time. 17 competitor tools were analyzed before building this dashboard. Zero of them integrate outbreak alerts with health system capacity data in a single interface.

This project closes that gap. Click an outbreak on the map, and the affected country's full health system profile appears alongside it: beds, physicians, immunization rates, WASH infrastructure, spending per capita, readiness scores, risk levels, and preparedness indices. The question is no longer just "what is happening" but "what is happening, and how prepared is this country to respond."

The project

Global Health Dashboard

Real-time outbreak context dashboard combining 633 WHO disease alerts across 99 ...

Source X in

public-healthdata-pipelineleafletNext.jsTypeScriptReact

Four waves of development

The project started as a thesis requirement for the Global Health Certificate of Distinction at the University of Arizona College of Medicine, Phoenix. It did not stay academic for long.

Wave 1 was infrastructure: a Python data pipeline pulling from the WHO Global Health Observatory API and the World Bank API, outputting clean JSON files that a Next.js frontend could read at runtime. No database. No API keys. No cost. The pipeline fetches 633 outbreak records across 99 countries from the WHO Disease Outbreak News endpoint, paginated 100 at a time going back to 2015. A second script pulls 25 health system indicators for 270 countries. A third fetches preparedness indices from three independent sources: the Global Health Security Index, the INFORM Risk Index, and the WHO IHR SPAR framework.

Wave 2 was the interface: an interactive Leaflet map with color-coded outbreak markers by disease category (respiratory, vector-borne, hemorrhagic, diarrheal, vaccine-preventable, zoonotic), a sidebar that opens on click with outbreak details alongside capacity benchmarks, and filter controls for disease type, date range, region, and active status. Country profile pages break down all 25 indicators across five groups with WHO benchmarks and progress bars.

Wave 3 added analytical depth: a composite readiness score computed from six core WHO capacity indicators, a risk scoring system combining outbreak pressure with health system vulnerability, a historical outbreak timeline with Recharts area charts, a country comparison tool, disease profile pages for 65 diseases, and regional overviews for all six WHO regions.

Wave 4 focused on credibility: extracting case and death counts from WHO DON HTML pages (365 of 633 outbreaks enriched), fixing disease miscategorizations, splitting multi-country outbreak records, adding data vintage warnings for stale indicators, and documenting the full methodology. The pipeline runs on GitHub Actions, refreshing outbreak data every 12 hours and capacity data quarterly, with zero manual intervention.

Data sources

Source	Data	Coverage	Refresh
WHO DON API	Outbreak alerts	633 outbreaks	Every 12h
WHO GHO API	Capacity indicators	270 countries	Quarterly
World Bank API	Health expenditure, GDP	270 countries	Quarterly
GHSI (NTI/JHU)	Security index	163 countries	2021 edition
INFORM (EU JRC)	Risk index	191 countries	Annual
WHO IHR SPAR	Core capacities	218 countries	Annual

The hard problem

The technical challenge is not any single data source. Each API is well-documented and free. The challenge is making them talk to each other.

WHO uses different country codes than the World Bank. The GHSI publishes a static CSV with its own naming conventions. INFORM uses an Excel file with an inverted risk scale. SPAR data comes through the WHO GHO API but uses a completely different indicator code system. Every data source needs its own parser, its own normalization logic, and its own mapping to ISO 3166-1 alpha-3 codes before any of them can appear on the same page.

The composite scores required careful design. The readiness score weights six WHO capacity indicators, requiring a minimum of three to avoid scoring countries with sparse data. The risk score multiplies outbreak pressure (a recency-weighted severity measure) by vulnerability (the inverse of readiness), producing five severity levels from minimal to critical. These are not arbitrary thresholds. They are calibrated against known outcomes: countries that scored "critical" during past outbreaks actually had worse response timelines.

The zero-cost constraint shaped every architectural decision. No database means JSON files read from disk. No API keys means only publicly available endpoints. No server-side compute means the Python pipeline runs on GitHub Actions and commits the output. Vercel serves the frontend for free. The entire system operates on the free tiers of every service it touches, which means it can run indefinitely without funding, which means it can outlast the thesis that spawned it.

Data pipeline

WHO / World Bank APIs

Python fetchers, no API keys

Normalize + Merge

ISO3 codes, unit conversion

Compute Scores

Readiness, risk, composite indices

JSON to Disk

8 data files, committed to repo

Next.js Serves

API routes read from disk

GitHub Actions

Auto-refresh, auto-deploy

What you can do with it

Interactive Map

Leaflet + CartoDB, outbreak markers + readiness choropleth

Country Profiles

25 indicators in 5 groups with WHO benchmarks

Disease Profiles

65 diseases with transmission, symptoms, outbreak history

Regional Overviews

6 WHO regions with aggregated metrics

Outbreak Timeline

Historical area chart with category and country filters

Country Comparison

Side-by-side capacity and index charts

Open questions

The biggest open question is deployment. The production build passes, Vercel is configured, GitHub Actions are wired, but the site has not been deployed yet. Once live, the first real test is whether the 12-hour outbreak refresh pipeline actually works end-to-end in production, catching new WHO Disease Outbreak News reports and updating the map without manual intervention.

The data has inherent staleness problems. The Global Health Security Index is from 2021. Some WHO capacity indicators for certain countries are 4+ years old. The data vintage warning badges help, but they do not solve the underlying issue: global health data infrastructure updates slowly, and any dashboard built on top of it inherits that latency. The question is how to communicate uncertainty honestly without undermining the tool's usefulness.

Case and death counts are only available for 365 of 633 outbreaks. The remaining 268 either lack structured data in the WHO DON reports or use narrative language that resists automated extraction. NLP-based extraction could improve coverage, but the risk of misattributing numbers to the wrong outbreak is non-trivial when reports discuss multiple events.

The comparison tool currently supports two countries side by side. Analysts working on regional outbreaks (cholera across the Horn of Africa, for example) need multi-country comparison. The UI complexity of comparing four or five countries without becoming unreadable is a design problem that does not have an obvious solution yet.

X in

2024