Case study

This sugar solves diabetes, but you can't afford it.

L-glucose costs $50,000/kg. Not because the chemistry is impossible, but because no one has mapped the shortest enzymatic path to make it cheaply.

135
Compounds
696
Reactions
94
Stereoisomers
41
Polyols
49
Tests

The problem

L-glucose is chemically identical to the sugar in your blood, with one difference: every chiral center is flipped. Your gut cannot absorb it. Your cells cannot metabolize it. It tastes sweet, triggers no insulin response, and passes through you unchanged. A perfect sweetener for diabetics. It also costs roughly $50,000 per kilogram, because no one has figured out how to make it cheaply.

The cost is not a chemistry problem. It is a pathfinding problem. L-glucose exists in the same reaction space as D-glucose, separated by enzymatic transformations that have been partially characterized, inconsistently documented, and scattered across decades of literature with no unified map. There are hundreds of monosaccharide stereoisomers: mirror-image sugars, chain-length variants, ring-open and ring-closed forms, polyol intermediates. The enzymes that interconvert them (epimerases, isomerases, reductases) are known in some cases and hypothesized in others. The question is not whether a synthesis route exists. The question is which route is shortest, which steps are experimentally validated, and how much trust you can place in each link.

No existing database answers this cleanly. KEGG and BRENDA catalog reactions, but their coverage is uneven. Common metabolic sugars are richly annotated; rare stereoisomers are sparse or absent. Pulling from those sources means inheriting their gaps. If L-talose or D-allose is missing from the database, your pathfinding algorithm will never find a route through it, even if that route is the shortest one in reality.

That is the problem SUGAR was built to solve. Not by scraping databases and hoping for completeness, but by generating the entire landscape from first principles: every stereoisomer that can exist by the rules of organic chemistry, every reaction type that connects them, every evidence tier from rigorously validated to theoretically plausible. Then asking the shortest-path question against that complete graph.

SU
SUGAR
Computational platform for enzymatic synthesis pathway discovery between sugars....
biochemistrystereochemistrygraph-theoryPythonNext.jsTypeScript

How it was rebuilt

The first version of SUGAR was built around NetworkX for graph computation, Neo4j for storage, and a static vis.js frontend for visualization. It worked, but it carried the weight of every architectural decision made before the scope was fully understood. Neo4j required a running server. The Python graph logic was entangled with the database layer. Adding a new reaction type meant touching four separate files. The frontend had no pathfinding at all. It could show the graph, but not answer the central question.

The 2026 rebuild started from a different premise: what if the pipeline produced nothing but static JSON, and the frontend consumed nothing but that JSON? The Python side became a deterministic enumeration engine. It generates every aldose and ketose stereoisomer from C2 through C7 by exhaustive cartesian product of R/S configurations at each chiral center (94 monosaccharides and 41 polyols), then applies reaction rules to produce 696 edges (478 epimerizations, 124 isomerizations, 94 reductions), validates mass balance on every reaction, and writes the result to flat JSON files. No database. No runtime dependencies. The same input always produces the same output.

The frontend loads that JSON at build time, constructs an in-memory adjacency graph using a typed TypeScript representation, and runs Yen's K-shortest-paths algorithm entirely client-side. There is no server, no API, no request that can fail. The app deploys to Vercel as a fully static site and requires no maintenance to keep running. The tradeoff is explicit: client-side pathfinding does not scale to arbitrarily large graphs. That is a documented design constraint that can be revisited when the graph actually grows large enough to matter.

1

NetworkX + Neo4j + vis.js

Required a running server, tightly coupled layers

2

Deterministic Python Pipeline

Exhaustive enumeration, flat JSON output, no runtime deps

3

Static TypeScript Frontend

In-memory graph, Yen's K-shortest paths, Vercel deploy

Three decisions

Three decisions shaped the architecture in ways that could not be easily undone.

The first: enumerate from stereochemistry, not from a curated database. Every monosaccharide in the system is generated by permuting R and S configurations across each chiral center, then matched against a name-lookup table for well-known compounds. Compounds without common names are assigned systematic identifiers. This guarantees completeness in the set-theoretic sense. If a stereoisomer can exist, it appears in the graph. No compound is absent because a curator did not enter it, because a paper described it ambiguously, or because a reaction was not considered commercially relevant. The cost of this approach is that many nodes have thin or absent experimental evidence, which is handled explicitly by the evidence tier system rather than hidden.

The second: four evidence tiers on every reaction edge. Validated means the reaction has been demonstrated in at least one peer-reviewed experimental context. Predicted means computational or mechanistic arguments suggest it should work. Inferred means it is structurally analogous to validated reactions in the same enzyme family. Hypothetical means the stoichiometry is correct and no mechanism rule is violated, but no evidence exists. The pathway finder exposes these tiers as a filter. A researcher who needs bench-ready routes can restrict to validated steps only; a computational chemist exploring the full theoretical landscape can include hypothetical edges. The evidence weight is baked into the cost function so that ranked results naturally surface the best-supported routes.

The third: static export as a first-class architectural constraint. The decision to generate JSON and run everything client-side was not a compromise forced by budget. It was a deliberate choice made to eliminate an entire class of operational risk. A backend can go down, hit rate limits, accumulate technical debt, or require security patches. A set of JSON files and compiled JavaScript cannot. The pathway finder will return the same answer five years from now as it does today, with no infrastructure maintenance in between.

Evidence classification

Validated

Demonstrated in peer-reviewed experimental context

Predicted

Computational or mechanistic arguments support it

Inferred

Structurally analogous to validated reactions

Hypothetical

Stoichiometry correct, no evidence exists

What it found

The current build covers 135 compounds and 696 reactions, validated by 49 tests (44 on the pipeline, 5 on the frontend). The pipeline is fully deterministic. Given the same reaction rules and stereochemistry parameters, it always produces the same output. This matters for reproducibility: the graph is a scientific artifact, not a database that drifts as entries are added or corrected. Any change to the compound set or reaction rules requires a deliberate, versioned pipeline run.

The web interface exposes the graph through nine pages: a dashboard with global search and aggregate statistics, a pathway finder with configurable maximum step count and evidence-tier filtering, separate compound and reaction browsers with detail views for each node and edge, an interactive Cytoscape.js network visualization, and a methodology page documenting the enumeration algorithm and evidence classification system. The pathway finder returns up to ten alternative routes ranked by a cost function that encodes both cofactor burden (ATP, NAD+, NADPH consumption) and experimental certainty derived from evidence tiers. A command palette accessible via Cmd+K provides fuzzy compound search across the full dataset without navigating away from the current page.

Enumeration pipeline

1
EnumerateR/S permutations at each chiral center
2
ClassifyAldoses, ketoses, polyols
3
React478 epimer. 124 isom. 94 red.
4
ValidateMass balance on every edge
5
ExportStatic JSON, no database
6
PathfindYen's K-shortest, client-side

What's next

Three expansion rings are planned but not started. Ring 2 would enrich the reaction edges with cross-references to KEGG reaction IDs, BRENDA enzyme entries, and EC numbers, adding database provenance without depending on it for completeness. Ring 3 would extend the compound set to sugar derivatives: phosphorylated, acetylated, and amino sugars, which are metabolically important but require a more complex reaction rule system to enumerate correctly. Ring 4 would add disaccharides and glycosidic bonds, which opens the door to oligosaccharide synthesis planning but requires representing directional bond formation rather than simple stereoisomer interconversion.

Two architectural questions remain open. Whether to keep pathfinding client-side as the graph grows into Ring 3 and Ring 4, or to introduce a lightweight server component for larger queries, depends on empirical performance data that does not yet exist. The current client-side implementation handles 135 compounds comfortably, and the threshold where it breaks is unknown. The more interesting open question is integration with experimental protocol databases: if SUGAR can compute a route from D-glucose to L-glucose through validated enzymatic steps, can it also auto-generate a bench procedure document (reagent concentrations, buffer conditions, expected yields) from that route? The data to do this exists in scattered literature. The pipeline to assemble it does not.

135 compounds · 696 reactions · 2026