MatchStudy

Linear algebra framework modeling the medical residency Match as spectral decomp...

linear-algebramedical-educationresidencyPythonNumPySciPy

Premise

The residency match is described as chaotic and credential-dependent, but its actual structure is a competition market with emergent tiers. The question: can linear algebra expose why competition clusters, why small CV improvements don't change outcomes, and which applicant features actually predict match rank, before you're inside the system?

How it evolved

Started as a formal restatement of the Gale-Shapley problem. Grew into a 15-matrix framework spanning five layers: ground-truth affinity (student and program feature matrices, two-sided preference weights), spectral decomposition of competition structure, noisy perception models for impostor syndrome and information asymmetry, action matrices for application masking and rank truncation, and a Gold/Silver/Regular signaling layer. A static three-page frontend lets users explore synthetic match markets and project their own archetype without running the Python engine. 104+ tests. IRB pending for real survey data.

Technical crux

The competition matrix (Ap @ Ap.T, where Ap = S @ Wp.T) concentrates 99.5% of variance in its first eigenmode. That's the formal statement that competitive specialties attract homogeneous candidates competing on identical features, and why adding a research line to a CV that already has two doesn't move you. Eigenvectors of Ap @ Ap.T are applicant archetypes; eigenvectors of Ap.T @ Ap are program tiers, and both emerge from the data without being hand-labeled. The perception layer is what makes the model clinically useful: S_hat adds noise and deflation to self-assessment (impostor syndrome), and an information_level parameter interpolates between perfect knowledge of program values and a uniform prior. Small CV changes don't help because they move you along the dominant eigenvector, where everyone else already lives.

Findings

Full Python engine with all 15 matrices, spectral decomposition, stable Gale-Shapley matching, and policy intervention simulation. Synthetic data analysis: impostor syndrome (deflation=0.10) measurably degrades average match rank; signaling (Gold/Silver budget) improves average rank from 7.4 to 6.4; full information versus zero information improves average rank by ~15 positions. Application cap of ~15 needed for full match rate in a 300-student/60-program synthetic market. 104+ passing tests across all modules. Three-page static frontend live on GitHub Pages.

Open questions

The model is calibrated on synthetic data. Real survey validation is blocked on IRB. The perception layer treats self-assessment noise as Gaussian; empirical data would tell us whether impostor syndrome deflation is constant or systematically correlated with demographic factors. The 15-matrix framework generalizes naturally to other matching markets (elite PhD programs, visa lotteries, law firm recruiting): that extension is designed but not yet implemented. The deeper policy question: if program tier structure is spectral, can targeted signaling policy flatten the eigenspectrum, or does competition reconcentrate around whatever signal is available?

Detailed case study in progress.

X in

2024