Protection Strategy Scoring#

AI-generated synthesis routes frequently contain selectivity issues — reactions where multiple functional groups could react, leading to unintended side products. This tutorial demonstrates how to detect these competing sites and score routes for selectivity quality using the synplan.route_quality.protection module.

The approach is inspired by:

Westerlund, A. M.; Sigmund, L. M.; Kannas, C.; Genheden, S.; Kabeshov, M.
“Toward lab-ready AI synthesis plans with protection strategies and route scoring.”

What this tutorial covers#

  1. Functional group detection — identify reactive FGs in molecules via SMARTS

  2. Competing site identification — find FGs that could interfere with a reaction

  3. Reaction type classification — classify reactions using CGR bond analysis

  4. Route scanning — walk a route step-by-step to find all competing interactions

  5. S(T) scoring — compute the competing-sites score for routes

  6. Route re-ranking — combine S(T) with the search score for selectivity-aware ranking

1. Setup#

[1]:
from pathlib import Path
from synplan.utils.loading import download_preset

# download SynPlanner preset data
paths = download_preset("synplanner-article", save_to="synplan_data")
ranking_policy_network = paths["ranking_policy"]
reaction_rules_path = paths["reaction_rules"]
# use your custom building blocks if needed
building_blocks_path = paths["building_blocks"]
[2]:
import synplan
from synplan.route_quality.protection import (
    ProtectionConfig,
    FunctionalGroupDetector,
    FunctionalGroupMatch,
    IncompatibilityMatrix,
    RouteScanner,
    CompetingSitesScore,
    classify_reaction_type,
    get_reaction_center_atoms,
)
print(f"SynPlanner route_quality.protection module loaded successfully")
SynPlanner route_quality.protection module loaded successfully

2. Functional Group Detection#

The FunctionalGroupDetector uses SMARTS patterns to identify reactive functional groups in molecules. The bundled library contains 19 patterns across three categories:

Category

Groups

Nucleophile

hydroxyl, phenol, primary/secondary amine, thiol, carboxylic acid, pyrrole/indole/amide NH, sulfonamide

Electrophile

aldehyde, ketone, acyl chloride, epoxide, isocyanate

Unsaturated

alkene, alkyne, Michael acceptor, diene

[3]:
from chython import smiles

config = ProtectionConfig()
detector = FunctionalGroupDetector(config.competing_groups_path)

print(f"Loaded {len(detector._patterns)} SMARTS patterns")
Loaded 102 SMARTS patterns

Detect FGs in a single molecule#

Let’s examine a molecule with multiple reactive functional groups — an amino acid derivative with hydroxyl, amine, and carboxylic acid groups.

[4]:
# Serine: amino acid with OH, NH2, and COOH
mol = smiles("OC(C(N)C(O)=O)=O")

matches = detector.detect_all(mol)
print(f"Found {len(matches)} functional group matches:\n")
for m in matches:
    print(f"  {m.name:20s} ({m.category:14s})  atoms: {m.atom_indices}")
Found 4 functional group matches:

  NonProlineAlphaAminoAcid_unprotected (AminoAcid     )  atoms: (4,)
  Acid_SaturatedAliphatic (Acid          )  atoms: (6,)
  Acid_SaturatedAliphatic (Acid          )  atoms: (1,)
  Amine_Primary_SaturatedAliphatic (Amine         )  atoms: (4,)

Identify competing FGs#

Given a set of reaction center atoms, detect_competing() returns only the FGs that are not at the reaction center — these are the ones that could interfere.

[5]:
# Suppose the reaction center involves the carboxylic acid (atoms at specific indices)
# First, let's see what atoms are in the molecule
for n, atom in mol.atoms():
    print(f"  atom {n}: {atom}")

print("\n--- Detecting competing FGs (excluding reaction center) ---")
# Pick the first two atoms as a hypothetical reaction center
first_match = matches[0]
reaction_center = set(first_match.atom_indices)
print(f"Reaction center atoms: {reaction_center} ({first_match.name})")

competing = detector.detect_competing(mol, reaction_center)
print(f"\nCompeting FGs ({len(competing)}):")
for c in competing:
    print(f"  {c.name:20s} ({c.category:14s})  atoms: {c.atom_indices}")
  atom 1: O()
  atom 2: C()
  atom 3: C()
  atom 4: N()
  atom 5: C()
  atom 6: O()
  atom 7: O()
  atom 8: O()

--- Detecting competing FGs (excluding reaction center) ---
Reaction center atoms: {4} (NonProlineAlphaAminoAcid_unprotected)

Competing FGs (2):
  Acid_SaturatedAliphatic (Acid          )  atoms: (6,)
  Acid_SaturatedAliphatic (Acid          )  atoms: (1,)

3. Reaction Classification#

The reaction classifier uses CGR (Condensed Graph of Reaction) bond analysis to classify reactions into broad types:

  • bond_formation — only new bonds are formed

  • bond_breaking — only existing bonds are broken

  • substitution — bonds are both formed and broken (most common)

  • other — no detectable bond changes

[6]:
from chython import smiles as parse_smiles

# Fischer esterification (mapped SMILES)
rxn = parse_smiles("[CH3:1][C:2](=[O:4])[OH:3].[CH3:5][CH2:6][OH:7]>>[CH3:1][C:2](=[O:4])[O:7][CH2:6][CH3:5].[OH2:3]")

center = get_reaction_center_atoms(rxn)
rxn_type = classify_reaction_type(rxn)

print(f"Reaction center atoms: {center}")
print(f"Reaction type: {rxn_type}")
Reaction center atoms: {2, 3, 7}
Reaction type: substitution

4. Run Retrosynthetic Planning#

Let’s plan routes for a target molecule and then score them for selectivity issues. We use capivasertib — an FDA-approved anti-cancer drug with multiple reactive functional groups.

Integrated scoring: By passing a ProtectionRouteScorer to the Tree, route scores are automatically weighted by S(T). This means tree.route_score(node_id) already reflects protection quality — no manual re-ranking needed.

[7]:
from synplan.chem.utils import mol_from_smiles
from synplan.mcts.tree import Tree
from synplan.utils.config import TreeConfig, RolloutEvaluationConfig
from synplan.utils.loading import (
    load_building_blocks,
    load_reaction_rules,
    load_policy_function,
    load_evaluation_function,
)

# Capivasertib — multiple reactive FGs (amide, amine, hydroxyl, heterocycles)
target_smiles = "NC1(C(=O)N[C@@H](CCO)c2ccc(Cl)cc2)CCN(c2nc[nH]c3nccc2-3)CC1"
target_mol = mol_from_smiles(target_smiles, clean2d=True, standardize=True, clean_stereo=True)

# Check what FGs the target has
target_fgs = detector.detect_all(target_mol)
print(f"Target molecule FGs ({len(target_fgs)}):")
for fg in target_fgs:
    print(f"  {fg.name} ({fg.category})")
Target molecule FGs (2):
  PrimaryAlcoholAliphatic (Alcohol)
  Amine_Primary_SaturatedAliphatic (Amine)
[8]:
target_mol
[8]:
../_images/user_guide_07_Protection_Scoring_14_0.svg
[9]:
from synplan.chem.utils import mol_from_smiles
from synplan.mcts.tree import Tree
from synplan.route_quality.scorer import ProtectionRouteScorer
from synplan.utils.config import TreeConfig, RolloutEvaluationConfig
from synplan.utils.loading import (
    load_building_blocks,
    load_reaction_rules,
    load_policy_function,
    load_evaluation_function,
)

# Capivasertib — multiple reactive FGs (amide, amine, hydroxyl, heterocycles)
target_smiles = "NC1(C(=O)N[C@@H](CCO)c2ccc(Cl)cc2)CCN(c2nc[nH]c3nccc2-3)CC1"
target_mol = mol_from_smiles(target_smiles, clean2d=True, standardize=True, clean_stereo=True)

# Check what FGs the target has
target_fgs = detector.detect_all(target_mol)
print(f"Target molecule FGs ({len(target_fgs)}):")
for fg in target_fgs:
    print(f"  {fg.name} ({fg.category})")

# Load resources and run search
building_blocks = load_building_blocks(building_blocks_path, standardize=False)
reaction_rules = load_reaction_rules(reaction_rules_path)
policy_network = load_policy_function(weights_path=ranking_policy_network)

tree_config = TreeConfig(
    search_strategy="expansion_first",
    max_iterations=300,
    max_time=120,
    max_depth=9,
    min_mol_size=1,
    init_node_value=0.5,
    ucb_type="uct",
    c_ucb=0.1,
)

eval_config = RolloutEvaluationConfig(
    policy_network=policy_network,
    reaction_rules=reaction_rules,
    building_blocks=building_blocks,
    min_mol_size=tree_config.min_mol_size,
    max_depth=tree_config.max_depth,
    normalize=True,
)
evaluation_function = load_evaluation_function(eval_config)

# Protection-aware scorer: route_score() will automatically apply S(T) weighting
route_scorer = ProtectionRouteScorer.from_config()

tree = Tree(
    target=target_mol,
    config=tree_config,
    reaction_rules=reaction_rules,
    building_blocks=building_blocks,
    expansion_function=policy_network,
    evaluation_function=evaluation_function,
    route_scorer=route_scorer,
)

for solved, node_id in tree:
    pass

print(f"Found {len(tree.winning_nodes)} routes")
print(f"route_score() now includes S(T) protection weighting automatically")
tree
Target molecule FGs (2):
  PrimaryAlcoholAliphatic (Alcohol)
  Amine_Primary_SaturatedAliphatic (Amine)
Lightning automatically upgraded your loaded checkpoint from v1.9.5 to v2.6.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint synplan_data/policy/supervised_gcn/v1/v1/ranking_policy.ckpt`
Found 128 routes
route_score() now includes S(T) protection weighting automatically
[9]:
Tree for: c1cc(ccc1Cl)C(NC(C2(CCN(CC2)c3c4cc[nH]c4ncn3)N)=O)CCO
Time: 17.7 seconds
Number of nodes: 3708
Number of iterations: 300
Number of visited nodes: 300
Number of found routes: 128

5. Extract and Score Routes#

Extract routes as routes_dict (the standard format: {route_id: {step_id: ReactionContainer}}), then score each route with S(T).

[10]:
from synplan.chem.reaction_routes.route_cgr import extract_reactions

routes_dict = extract_reactions(tree)
print(f"Extracted {len(routes_dict)} routes")

# Show route lengths
for rid, route in list(routes_dict.items())[:5]:
    print(f"  Route {rid}: {len(route)} steps")
Extracted 128 routes
  Route 71: 5 steps
  Route 72: 5 steps
  Route 74: 5 steps
  Route 201: 7 steps
  Route 202: 7 steps

Set up the protection scorer#

The scorer pipeline is:

FunctionalGroupDetector → RouteScanner → CompetingSitesScore
         ↑                      ↑
  SMARTS patterns    IncompatibilityMatrix
[11]:
matrix = IncompatibilityMatrix(config.incompatibility_path)
scanner = RouteScanner(detector, matrix)
scorer = CompetingSitesScore(scanner)

print("Scorer ready.")
Scorer ready.

Score a single route in detail#

Let’s look at one route and see exactly which competing interactions were found.

[12]:
first_rid = next(iter(routes_dict))
first_route = routes_dict[first_rid]

score, interactions = scorer.score_route(first_route)

print(f"Route {first_rid}: S(T) = {score:.3f}")
print(f"Number of steps: {len(first_route)}")
print(f"Number of competing interactions: {len(interactions)}")

if interactions:
    print(f"\nDetailed interactions:")
    for inter in interactions:
        print(
            f"  Step {inter.step_id}: "
            f"{inter.fg_name} vs {inter.reacting_fg or 'unknown'}{inter.severity}"
        )
Route 71: S(T) = 0.900
Number of steps: 5
Number of competing interactions: 5

Detailed interactions:
  Step 0: NonProlineAlphaAminoAcid_protected vs Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic → compatible
  Step 0: Acid_SaturatedAliphatic vs Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic → compatible
  Step 1: NonProlineAlphaAminoAcid_protected vs Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic → compatible
  Step 1: Acid_SaturatedAliphatic vs Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic → compatible
  Step 4: PrimaryAlcoholAliphatic vs Amine_Primary_SaturatedAliphatic → competing

Score all routes#

Compute S(T) for every route and look at the distribution.

[13]:
scores = {}
for rid, route in routes_dict.items():
    s, _ = scorer.score_route(route)
    scores[rid] = s

# Distribution summary
score_vals = list(scores.values())
n_perfect = sum(1 for s in score_vals if s == 1.0)
n_good = sum(1 for s in score_vals if 0.75 <= s < 1.0)
n_moderate = sum(1 for s in score_vals if 0.5 <= s < 0.75)
n_poor = sum(1 for s in score_vals if s < 0.5)

print(f"S(T) score distribution across {len(scores)} routes:")
print(f"  Perfect  (S=1.0):     {n_perfect:4d}  ({100*n_perfect/len(scores):.1f}%)")
print(f"  Good     (0.75-1.0):  {n_good:4d}  ({100*n_good/len(scores):.1f}%)")
print(f"  Moderate (0.5-0.75):  {n_moderate:4d}  ({100*n_moderate/len(scores):.1f}%)")
print(f"  Poor     (<0.5):      {n_poor:4d}  ({100*n_poor/len(scores):.1f}%)")
print(f"\nMean S(T): {sum(score_vals)/len(score_vals):.3f}")
print(f"Min  S(T): {min(score_vals):.3f}")
print(f"Max  S(T): {max(score_vals):.3f}")
S(T) score distribution across 128 routes:
  Perfect  (S=1.0):       13  (10.2%)
  Good     (0.75-1.0):    78  (60.9%)
  Moderate (0.5-0.75):    37  (28.9%)
  Poor     (<0.5):         0  (0.0%)

Mean S(T): 0.799
Min  S(T): 0.500
Max  S(T): 1.000

6. Route Re-ranking#

Since we passed a ProtectionRouteScorer to the Tree, tree.route_score(node_id) already returns the protection-weighted score:

score = original * S(T)

The manual re-ranking below demonstrates the rank_routes() API for cases where you want a different blending (e.g. additive with custom weight), or when working with saved routes outside of the tree.

[14]:
# Get original route scores from the tree
existing_scores = {
    node_id: tree.route_score(node_id)
    for node_id in tree.winning_nodes
}

# Re-rank with combined score
ranked = scorer.rank_routes(
    routes_dict,
    existing_scores=existing_scores,
    weight=0.75,
)

print(f"{'Rank':>4}  {'Route':>6}  {'Combined':>8}  {'S(T)':>6}  {'Original':>8}")
print("-" * 42)
for i, (rid, combined, protection, original) in enumerate(ranked[:15]):
    print(f"{i+1:4d}  {rid:6d}  {combined:8.3f}  {protection:6.3f}  {original:8.4f}")
Rank   Route  Combined    S(T)  Original
------------------------------------------
   1    1299     0.911   1.000    0.1088
   2    1300     0.911   1.000    0.1088
   3    1302     0.911   1.000    0.1088
   4    2932     0.897   1.000    0.0990
   5    2933     0.897   1.000    0.0990
   6    2935     0.897   1.000    0.0990
   7    1465     0.879   1.000    0.0868
   8    2035     0.876   1.000    0.0851
   9    2045     0.876   1.000    0.0851
  10    2047     0.876   1.000    0.0851
  11    1484     0.865   1.000    0.0777
  12    1485     0.865   1.000    0.0777
  13    1487     0.865   1.000    0.0777
  14    3105     0.860   0.900    0.1247
  15    3106     0.860   0.900    0.1247

Compare original vs re-ranked top routes#

See which routes moved up (fewer selectivity issues) or down (more issues).

[15]:
# Original ranking (by search score)
original_ranking = sorted(existing_scores.items(), key=lambda x: x[1], reverse=True)
original_top10 = [rid for rid, _ in original_ranking[:10]]

# New ranking (by combined score)
new_top10 = [rid for rid, _, _, _ in ranked[:10]]

promoted = set(new_top10) - set(original_top10)
demoted = set(original_top10) - set(new_top10)

print(f"Original top-10: {original_top10}")
print(f"New top-10:      {new_top10}")
print(f"\nPromoted into top-10:  {promoted or 'none'}")
print(f"Demoted from top-10:   {demoted or 'none'}")
Original top-10: [322, 766, 3105, 3106, 3108, 439, 71, 72, 74, 1299]
New top-10:      [1299, 1300, 1302, 2932, 2933, 2935, 1465, 2035, 2045, 2047]

Promoted into top-10:  {2935, 2035, 2932, 2933, 1300, 1302, 1465, 2045, 2047}
Demoted from top-10:   {3105, 322, 3106, 3108, 71, 72, 74, 439, 766}

Route Visualisation#

Let’s visualise routes to see how competing functional groups correlate with route structure. We’ll compare:

  1. A problematic route (lowest S(T)) that has many competing interactions

  2. The best available route (highest S(T)) with the fewest selectivity issues

[16]:
from IPython.display import SVG, display, HTML
from synplan.utils.visualisation import get_route_svg

# Sort routes by S(T) score
sorted_by_score = sorted(scores.items(), key=lambda x: x[1])

# Worst route (lowest S(T))
worst_rid, worst_score = sorted_by_score[0]
# Best route (highest S(T))
best_rid, best_score = sorted_by_score[-1]

print(f"Worst route: {worst_rid}  S(T) = {worst_score:.3f}")
print(f"Best route:  {best_rid}  S(T) = {best_score:.3f}")
Worst route: 342  S(T) = 0.500
Best route:  2935  S(T) = 1.000

Problematic route (lowest S(T))#

This route has the most competing functional group interactions. The detailed interaction table below the diagram shows which FGs conflict at each step.

[17]:
# Visualise the worst route
worst_svg = get_route_svg(tree, worst_rid)
if worst_svg:
    display(HTML(f"<h4>Route {worst_rid} — S(T) = {worst_score:.3f}</h4>"))
    display(SVG(worst_svg))
else:
    print(f"Route {worst_rid} could not be visualised (not in winning nodes).")

Route 342 — S(T) = 0.500

../_images/user_guide_07_Protection_Scoring_31_1.svg
[18]:
# Show competing interactions for the worst route
worst_route = routes_dict[worst_rid]
_, worst_interactions = scorer.score_route(worst_route)
i_count, c_count, h_count = RouteScanner.classify_interactions(worst_interactions)

print(f"Route {worst_rid}: {len(worst_route)} steps, "
      f"{len(worst_interactions)} interactions "
      f"(I={i_count}, C={c_count}, H={h_count})\n")

# Group by step
from collections import defaultdict
by_step = defaultdict(list)
for inter in worst_interactions:
    by_step[inter.step_id].append(inter)

for step_id in sorted(by_step):
    step_inters = by_step[step_id]
    non_compatible = [i for i in step_inters if i.severity != "compatible"]
    reacting = step_inters[0].reacting_fg or "unknown"
    if non_compatible:
        print(f"Step {step_id} (reacting FG: {reacting}):")
        for i in non_compatible:
            marker = "!!" if i.severity == "incompatible" else " ~"
            print(f"  {marker} {i.fg_name}{i.severity}")
        print()
Route 342: 3 steps, 8 interactions (I=0, C=5, H=0)

Step 0 (reacting FG: Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic):
   ~ NonProlineAlphaAminoAcid_unprotected → competing
   ~ Amine_Primary_SaturatedAliphatic → competing

Step 1 (reacting FG: Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic):
   ~ NonProlineAlphaAminoAcid_unprotected → competing
   ~ Amine_Primary_SaturatedAliphatic → competing

Step 2 (reacting FG: Acid_SaturatedAliphatic):
   ~ PrimaryAlcoholAliphatic → competing

Best available route (highest S(T))#

This route has the fewest selectivity concerns. In a real workflow, this is the route you’d prefer to execute in the lab.

[19]:
# Visualise the best route
best_svg = get_route_svg(tree, best_rid)
if best_svg:
    display(HTML(f"<h4>Route {best_rid} — S(T) = {best_score:.3f}</h4>"))
    display(SVG(best_svg))
else:
    print(f"Route {best_rid} could not be visualised (not in winning nodes).")

Route 2935 — S(T) = 1.000

../_images/user_guide_07_Protection_Scoring_34_1.svg
[20]:
# Show interactions for the best route
best_route = routes_dict[best_rid]
_, best_interactions = scorer.score_route(best_route)
i_best, c_best, h_best = RouteScanner.classify_interactions(best_interactions)

print(f"Route {best_rid}: {len(best_route)} steps, "
      f"{len(best_interactions)} interactions "
      f"(I={i_best}, C={c_best}, H={h_best})\n")

by_step_best = defaultdict(list)
for inter in best_interactions:
    by_step_best[inter.step_id].append(inter)

for step_id in sorted(by_step_best):
    step_inters = by_step_best[step_id]
    non_compatible = [i for i in step_inters if i.severity != "compatible"]
    reacting = step_inters[0].reacting_fg or "unknown"
    if non_compatible:
        print(f"Step {step_id} (reacting FG: {reacting}):")
        for i in non_compatible:
            marker = "!!" if i.severity == "incompatible" else " ~"
            print(f"  {marker} {i.fg_name}{i.severity}")
        print()
    else:
        print(f"Step {step_id} (reacting FG: {reacting}): all compatible")
Route 2935: 6 steps, 7 interactions (I=0, C=0, H=0)

Step 0 (reacting FG: Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic): all compatible
Step 1 (reacting FG: Amine_CyclicSecondary_SaturatedAliphatic-SaturatedAliphatic): all compatible
Step 4 (reacting FG: Acid_SaturatedAliphatic): all compatible
Step 5 (reacting FG: unknown): all compatible

Top re-ranked route#

After combining the search score with the protection score, the top-ranked route represents the best trade-off between synthetic feasibility (original score) and selectivity (S(T)).

[21]:
# Top re-ranked route (best combined score)
top_rid, top_combined, top_protection, top_original = ranked[0]

print(f"Top re-ranked route: {top_rid}")
print(f"  Combined score:   {top_combined:.3f}")
print(f"  Protection S(T):  {top_protection:.3f}")
print(f"  Original score:   {top_original:.4f}\n")

top_svg = get_route_svg(tree, top_rid)
if top_svg:
    display(HTML(f"<h4>Top re-ranked: Route {top_rid} — combined = {top_combined:.3f}</h4>"))
    display(SVG(top_svg))
else:
    print(f"Route {top_rid} could not be visualised.")
Top re-ranked route: 1299
  Combined score:   0.911
  Protection S(T):  1.000
  Original score:   0.1088

Top re-ranked: Route 1299 — combined = 0.911

../_images/user_guide_07_Protection_Scoring_37_2.svg

7. Standalone Usage (from saved routes)#

You can also score previously saved routes without re-running the search.

[22]:
from synplan.chem.reaction_routes.io import read_routes_json

# Example: load from a JSON file produced by a previous search
# routes_dict = read_routes_json("path/to/routes.json", to_dict=True)

# Set up scorer from scratch (uses bundled defaults)
config = ProtectionConfig()
detector = FunctionalGroupDetector(config.competing_groups_path)
matrix = IncompatibilityMatrix(config.incompatibility_path)
scanner = RouteScanner(detector, matrix)
scorer = CompetingSitesScore(scanner)

# Score and rank (protection score only, no original search score)
ranked_protection_only = scorer.rank_routes(routes_dict, weight=1.0)

print(f"{'Rank':>4}  {'Route':>6}  {'S(T)':>6}")
print("-" * 20)
for i, (rid, combined, protection, _) in enumerate(ranked_protection_only[:10]):
    print(f"{i+1:4d}  {rid:6d}  {protection:6.3f}")
Rank   Route    S(T)
--------------------
   1    1299   1.000
   2    1300   1.000
   3    1302   1.000
   4    1465   1.000
   5    1484   1.000
   6    1485   1.000
   7    1487   1.000
   8    2035   1.000
   9    2045   1.000
  10    2047   1.000

8. Interpreting the Score#

The competing-sites score S(T) is computed as:

\[S(\mathcal{T}) = \max\left[1 - \frac{I + 0.5 C + H}{\max(N_{\text{reactions}}, 1)},\; 0\right]\]

Where:

  • I = number of highly incompatible functional group interactions (penalty = 1.0 each)

  • C = number of competing interactions (penalty = 0.5 each)

  • H = number of halogen competing sites (penalty = 1.0 each; reserved for future)

  • N = number of reaction steps in the route

Implementation note

The actual implementation uses a worst-per-step variant of the formula above. Instead of summing all individual interaction penalties, each step contributes only the penalty of its most severe interaction (1.0 for incompatible, 0.5 for competing, 0.0 for compatible). This prevents drug-like molecules with many functional groups from overwhelming the score. The formula becomes: S(T) = max[1 - (sum(w_s) + H) / max(N, 1), 0], where w_s = max severity penalty among interactions in step s.

S(T)

Interpretation

1.0

No competing sites — route is clean

0.75–0.99

Minor issues (e.g., 1 competing FG in a 4-step route)

0.5–0.75

Moderate selectivity issues — may benefit from protecting groups

< 0.5

Significant problems — route should be deprioritized

0.0

Maximally conflicting

References#

  • Westerlund, A. M. et al. “Toward lab-ready AI synthesis plans with protection strategies and route scoring.” ChemRxiv, 2025. doi:10.26434/chemrxiv-2025-gdrr8

  • Kogej, T. et al. “SMARTS-RX: A SMARTS-Based Representation of Chemical Functions for Reactivity Analysis.” ChemRxiv, 2025.

  • SynPlanner documentation: synplanner.readthedocs.io