PLM-guided directed evolution · open source

Design protein variants with AI.
Skip UV and chemical mutagenesis

Syntheogenesis turns a single protein sequence into a 30–100 variant smart library, ranked by predicted evolutionary fitness, codon-optimized for your expression host, and ready to order from your synthesis vendor — all in about a minute on a laptop. No UV chamber. No EMS. No guesswork.

ESM-2 35M / 650M / 3B · Runs locally, no telemetry · MIT licensed
Why this exists

UV mutagenesis is a workplace hazard. We removed it from the workflow.

Zero UV exposure

Mutation design moves from your wet bench to a Python process. No UV-C chamber sessions, no photokeratitis risk, no institutional sign-off for radiation use.

Targeted, not random

Every mutation is one the protein language model thinks evolution would tolerate. Hit rates 5–50× higher than random mutagenesis at equal screening cost.

Traceable from day zero

Every variant ships with its mutation list, predicted fitness, optimized DNA, primer pair, and PCR conditions. No Sanger-sequencing rounds to figure out what changed.

How it works

Four stages, one minute.

  1. 01

    Parse & translate

    FASTA, SnapGene (.dna), GenBank, EMBL, raw DNA, or raw protein — all auto-detected. Plasmid uploads surface a CDS picker so you choose the right gene. Raw DNA with multiple stops triggers a 6-frame ORF scan with a frame-aware picker.

  2. 02

    Zero-shot scoring with ESM-2

    Computes ΔLL = log P(mutant | xWT) − log P(WT | xWT) for every position × 19 substitutions. Meier et al. 2021 wild-type marginal scheme. Default model is ESM-2 35M; configurable up to 3B on GPU.

  3. 03

    Combinatorial search

    Simulated annealing over the top-percentile pool of single-site mutations. Multi-restart, cumulative ΣΔLL as fitness, stop-codon and duplicate-position penalties.

  4. 04

    Codon-optimize & clean

    Reverse-translate with E. coli / yeast / human codon-usage tables. Synonymously scrub BsaI, BsmBI, NotI sites so the library drops straight into Golden Gate. Outputs CSV / Excel / GenBank / FASTA / JSON.

What to trust

Honest accuracy expectations.

PLM-guided libraries don't guarantee functional variants. They make screening drastically more efficient.

Mutations / variantApproximate functional retention
1–270–85%
3–450–70%
5–630–55%
7–815–40%
9+typically < 25%
The honest framing: stay shallow (cap at 3–4 mutations), screen experimentally, treat the fitness score as a prior — not a verdict.

Try it on your protein

Open source. Free to use. Your sequences never leave your machine for the core pipeline (BLAST and AlphaFold lookups are opt-in).

Hugging Face Space link goes live after the deploy. Until then, clone the repo and run python -m dee.server locally.

Built on

Open models & standard biology.