Policy Network Training#

This tutorial demonstrates how to train ranking and filtering policy network in SynPlanner

Basic recommendations#

1. Prefer ranking policy network over filtering policy network

The filtering policy network in its current implementation requires a lot of computational resources and its training is practically feasible with many CPUs and several dozen GB of RAM in case of large training sets. The bottleneck of the current implementation is the preparation of the training dataset, particularly the generation of binary vectors if successfully applied reaction rules to each training molecule. Thus, with limited computational resources, it is recommended to use a ranking policy network.

2. Use a filtering policy network for the portability of reaction rules between different tools

Filtering policy networks can be trained with any set of reaction rules, including those generated with other software because filtering network training does not depend on the original reaction dataset from which the reaction rules were extracted. In this case, the filtering policy network can be used for comparison of reaction rules extracted with different software/tools.

3. Reduce the size of the training molecules for filtering policy network

The problem of computational resources for filtering policy networks can be partially solved by a drastic reduction of the training set of molecules.

1. Set up input and output data locations#

The SynPlanner input data will be downloaded from the HuggingFace repository to the specified directory.

[ ]:

import os
from pathlib import Path
from synplan.utils.loading import download_unpack_data

data_folder = Path("synplan_data").resolve()

# results folder
results_folder = Path("tutorial_results").resolve()
results_folder.mkdir(exist_ok=True)

# input data
reaction_rules_path = results_folder.joinpath("uspto_reaction_rules.tsv")  # needed for both ranking and filtering policy network training

filtered_data_path = results_folder.joinpath("uspto_filtered.smi")  # needed for ranking policy network training

# Download molecules for filtering policy training from new repo
molecules_data_path = download_unpack_data(
    filename="molecules_for_training.smi.zip",
    subfolder="training_data/filtering_policy/2024-12-31",
    save_to=data_folder,
)

# output data
ranking_policy_network_folder = results_folder.joinpath("ranking_policy_network")
filtering_policy_network_folder = results_folder.joinpath("filtering_policy_network")

# output data
ranking_policy_dataset_path = ranking_policy_network_folder.joinpath("ranking_policy_dataset.pt")
filtering_policy_dataset_path = filtering_policy_network_folder.joinpath("filtering_policy_dataset.pt")

2. Ranking policy training#

Ranking network configuration#

[2]:

from synplan.utils.config import PolicyNetworkConfig
from synplan.ml.training.supervised import create_policy_dataset, run_policy_training

training_config = PolicyNetworkConfig(
    policy_type="ranking",  # the type of policy network
    num_conv_layers=5,  # the number of graph convolutional layers in the network
    vector_dim=512,  # the dimensionality of the final embedding vector
    learning_rate=0.0008,  # the learning rate for the training process
    dropout=0.4,  # the dropout rate
    num_epoch=100,  # the number of epochs for training
    batch_size=100,
)  # the size of training batch of input data

Creating ranking network training set#

Next, we create the policy dataset using the create_policy_dataset function. This involves specifying paths to the reaction rules and the reaction data:

[3]:

datamodule = create_policy_dataset(
    dataset_type="ranking",
    reaction_rules_path=reaction_rules_path,
    molecules_or_reactions_path=filtered_data_path,
    output_path=ranking_policy_dataset_path,
    batch_size=training_config.batch_size,
    num_cpus=4,
)

Number of reactions processed: 1019304 [2:36:15]

Training set size: 616841, validation set size: 154211

Running ranking policy network training#

Finally, we train the policy network using the run_policy_training function. This step involves feeding the dataset and the training configuration into the network:

GPU requirement

By default, run_policy_training uses accelerator=”gpu”. If you do not have a GPU available, pass accelerator=”cpu” or accelerator=”auto” (which will automatically detect the best available device).

[4]:

run_policy_training(
    datamodule,  # the prepared data module for training
    config=training_config,  # the training configuration
    results_path=ranking_policy_network_folder,
)  # path to save the training results

The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name        | Type           | Params | Mode
-------------------------------------------------------
0 | embedder    | GraphEmbedding | 1.3 M  | train
1 | y_predictor | Linear         | 17.9 M | train
-------------------------------------------------------
19.2 M    Trainable params
0         Non-trainable params
19.2 M    Total params
76.944    Total estimated model params size (MB)

Weight decoupling enabled in AdaBelief
Rectification enabled in AdaBelief

`Trainer.fit` stopped: `max_epochs=100` reached.

Policy network balanced accuracy: 0.88

3. Filtering policy training#

Filtering network configuration#

[5]:

from synplan.utils.config import PolicyNetworkConfig
from synplan.ml.training.supervised import create_policy_dataset, run_policy_training

training_config = PolicyNetworkConfig(
    policy_type="filtering",  # the type of policy network
    num_conv_layers=5,  # the number of graph convolutional layers in the network
    vector_dim=512,  # the dimensionality of the final embedding vector
    learning_rate=0.0008,  # the learning rate for the training process
    dropout=0.4,  # the dropout rate
    num_epoch=100,  # the number of epochs for training
    batch_size=100,
)  # the size of training batch of input data

Creating filtering network training set#

Next, we create the policy dataset using the create_policy_dataset function. This involves specifying paths to the reaction rules and the molecules dataset:

[1]:

datamodule = create_policy_dataset(
    dataset_type="filtering",
    reaction_rules_path=reaction_rules_path,
    molecules_or_reactions_path=molecules_data_path,
    output_path=filtering_policy_dataset_path,
    batch_size=training_config.batch_size,
    num_cpus=4,
)

Running filtering policy network training#

Finally, we train the policy network using the run_policy_training function. This step involves feeding the dataset and the training configuration into the network:

[ ]:

run_policy_training(
    datamodule,  # the prepared data module for training
    config=training_config,  # the training configuration
    results_path=filtering_policy_network_folder,
)  # path to save the training results

Results#

If the tutorial is executed successfully, you will get in the results folder three reaction data files (from reaction curation tutorial), corresponding extracted reaction rules (from reaction rules extraction tutorial) and trained ranking and filtering policy network:

original reaction data
standardized reaction data
filtered reaction data
extracted reaction rules
ranking policy network folder (the training set and trained network)
filtering policy network folder (the training set and trained network)

[3]:

sorted(Path(results_folder).iterdir(), key=os.path.getmtime, reverse=False)

[3]:

[PosixPath('/home1/dima/synplanner/tutorials/tutorial_results/uspto_original.smi'),
 PosixPath('/home1/dima/synplanner/tutorials/tutorial_results/uspto_standardized.smi'),
 PosixPath('/home1/dima/synplanner/tutorials/tutorial_results/uspto_filtered.smi'),
 PosixPath('/home1/dima/synplanner/tutorials/tutorial_results/uspto_reaction_rules.pickle'),
 PosixPath('/home1/dima/synplanner/tutorials/tutorial_results/ranking_policy_network')]