Thesis Project
Outbreak trackers show what is happening. This dashboard shows whether the local health system can handle it.
Global Health Certificate of Distinction, University of Arizona College of Medicine - Phoenix
633
Outbreak records
270
Countries tracked
25
Health indicators
12h
Refresh cycle
$0
Operating cost
When a disease outbreak hits, two questions matter: what is happening, and can the local health system handle it?
Outbreak trackers answer the first question. The WHO publishes Disease Outbreak News. HealthMap aggregates media reports. ProMED circulates expert alerts. These tools tell you that there is a cholera outbreak in Country X with Y confirmed cases.
Health system assessments answer the second question. The Global Health Security Index scores national preparedness. The INFORM Risk Index quantifies vulnerability. The WHO's IHR SPAR framework tracks core surveillance and response capacities. These tools tell you how many hospital beds Country X has per capita, what its physician density looks like, whether its immunization coverage is above or below the WHO threshold.
The problem is that no public tool answers both questions at the same time. When an analyst needs to assess whether a Marburg outbreak in East Africa is a regional emergency or a contained event, they open the WHO DON page in one tab, the GHSI database in another, the World Bank health expenditure data in a third, and cross-reference manually. This is the workflow for every outbreak, every time. 17 competitor tools were analyzed before building this dashboard. Zero of them integrate outbreak alerts with health system capacity data in a single interface.
This project closes that gap. Click an outbreak on the map, and the affected country's full health system profile appears alongside it: beds, physicians, immunization rates, WASH infrastructure, spending per capita, readiness scores, risk levels, and preparedness indices. The question is no longer just "what is happening" but "what is happening, and how prepared is this country to respond."
The project
Four waves of development
The project started as a thesis requirement for the Global Health Certificate of Distinction at the University of Arizona College of Medicine, Phoenix. It did not stay academic for long.
Wave 1 was infrastructure: a Python data pipeline pulling from the WHO Global Health Observatory API and the World Bank API, outputting clean JSON files that a Next.js frontend could read at runtime. No database. No API keys. No cost. The pipeline fetches 633 outbreak records across 99 countries from the WHO Disease Outbreak News endpoint, paginated 100 at a time going back to 2015. A second script pulls 25 health system indicators for 270 countries. A third fetches preparedness indices from three independent sources: the Global Health Security Index, the INFORM Risk Index, and the WHO IHR SPAR framework.
Wave 2 was the interface: an interactive Leaflet map with color-coded outbreak markers by disease category (respiratory, vector-borne, hemorrhagic, diarrheal, vaccine-preventable, zoonotic), a sidebar that opens on click with outbreak details alongside capacity benchmarks, and filter controls for disease type, date range, region, and active status. Country profile pages break down all 25 indicators across five groups with WHO benchmarks and progress bars.
Wave 3 added analytical depth: a composite readiness score computed from six core WHO capacity indicators, a risk scoring system combining outbreak pressure with health system vulnerability, a historical outbreak timeline with Recharts area charts, a country comparison tool, disease profile pages for 65 diseases, and regional overviews for all six WHO regions.
Wave 4 focused on credibility: extracting case and death counts from WHO DON HTML pages (365 of 633 outbreaks enriched), fixing disease miscategorizations, splitting multi-country outbreak records, adding data vintage warnings for stale indicators, and documenting the full methodology. The pipeline runs on GitHub Actions, refreshing outbreak data every 12 hours and capacity data quarterly, with zero manual intervention.
Data sources
| Source | Data | Coverage | Refresh |
|---|---|---|---|
| WHO DON API | Outbreak alerts | 633 outbreaks | Every 12h |
| WHO GHO API | Capacity indicators | 270 countries | Quarterly |
| World Bank API | Health expenditure, GDP | 270 countries | Quarterly |
| GHSI (NTI/JHU) | Security index | 163 countries | 2021 edition |
| INFORM (EU JRC) | Risk index | 191 countries | Annual |
| WHO IHR SPAR | Core capacities | 218 countries | Annual |
The hard problem
The technical challenge is not any single data source. Each API is well-documented and free. The challenge is making them talk to each other.
WHO uses different country codes than the World Bank. The GHSI publishes a static CSV with its own naming conventions. INFORM uses an Excel file with an inverted risk scale. SPAR data comes through the WHO GHO API but uses a completely different indicator code system. Every data source needs its own parser, its own normalization logic, and its own mapping to ISO 3166-1 alpha-3 codes before any of them can appear on the same page.
The composite scores required careful design. The readiness score weights six WHO capacity indicators, requiring a minimum of three to avoid scoring countries with sparse data. The risk score multiplies outbreak pressure (a recency-weighted severity measure) by vulnerability (the inverse of readiness), producing five severity levels from minimal to critical. These are not arbitrary thresholds. They are calibrated against known outcomes: countries that scored "critical" during past outbreaks actually had worse response timelines.
The zero-cost constraint shaped every architectural decision. No database means JSON files read from disk. No API keys means only publicly available endpoints. No server-side compute means the Python pipeline runs on GitHub Actions and commits the output. Vercel serves the frontend for free. The entire system operates on the free tiers of every service it touches, which means it can run indefinitely without funding, which means it can outlast the thesis that spawned it.
Data pipeline
WHO / World Bank APIs
Python fetchers, no API keys
Normalize + Merge
ISO3 codes, unit conversion
Compute Scores
Readiness, risk, composite indices
JSON to Disk
8 data files, committed to repo
Next.js Serves
API routes read from disk
GitHub Actions
Auto-refresh, auto-deploy
What you can do with it
Interactive Map
Leaflet + CartoDB, outbreak markers + readiness choropleth
Country Profiles
25 indicators in 5 groups with WHO benchmarks
Disease Profiles
65 diseases with transmission, symptoms, outbreak history
Regional Overviews
6 WHO regions with aggregated metrics
Outbreak Timeline
Historical area chart with category and country filters
Country Comparison
Side-by-side capacity and index charts
Open questions
The biggest open question is deployment. The production build passes, Vercel is configured, GitHub Actions are wired, but the site has not been deployed yet. Once live, the first real test is whether the 12-hour outbreak refresh pipeline actually works end-to-end in production, catching new WHO Disease Outbreak News reports and updating the map without manual intervention.
The data has inherent staleness problems. The Global Health Security Index is from 2021. Some WHO capacity indicators for certain countries are 4+ years old. The data vintage warning badges help, but they do not solve the underlying issue: global health data infrastructure updates slowly, and any dashboard built on top of it inherits that latency. The question is how to communicate uncertainty honestly without undermining the tool's usefulness.
Case and death counts are only available for 365 of 633 outbreaks. The remaining 268 either lack structured data in the WHO DON reports or use narrative language that resists automated extraction. NLP-based extraction could improve coverage, but the risk of misattributing numbers to the wrong outbreak is non-trivial when reports discuss multiple events.
The comparison tool currently supports two countries side by side. Analysts working on regional outbreaks (cholera across the Horn of Africa, for example) need multi-country comparison. The UI complexity of comparing four or five countries without becoming unreadable is a design problem that does not have an obvious solution yet.