We explore the intricate world of natural product chemistry through the lens of computational
modelling. Utilising a Transformer-VAE foundation model originally pre-trained on
the GuacaMol dataset, known for its comprehensive collection of small drug-like molecules.
We explore the COCONUT dataset’s natural products within this model’s latent space.
This approach allows us to investigate these complex natural compounds’ structural
organisation and relationships in a latent space tailored to smaller, drug-like molecules.
Our findings provide insightful revelations about the similarities and divergences
between these two distinct molecular realms. We uncover new perspectives on molecular
similarity and potential bioactivity by examining how natural products, with their
diverse and often complex structures, are represented and structured in a latent space
initially trained on more simplistic molecules. This research sheds light on the capabilities
and adaptability of pre-trained models in chemical informatics. It could help pave
the way for innovative approaches in discovering and analysing natural products for
pharmaceutical applications.