The screening of compounds for ADME-Tox targets plays an important role in drug design.
QSPR models can increase the speed of these specific tasks, although the performance
of the models highly depends on several factors, such as the applied molecular descriptors.
In this study, a detailed comparison of the most popular descriptor groups has been
carried out for six main ADME-Tox classification targets: Ames mutagenicity, P-glycoprotein
inhibition, hERG inhibition, hepatotoxicity, blood–brain-barrier permeability, and
cytochrome P450 2C9 inhibition. The literature-based, medium-sized binary classification
datasets (all above 1,000 molecules) were used for the model building by two common
algorithms, XGBoost and the RPropMLP neural network. Five molecular representation
sets were compared along with their joint applications: Morgan, Atompairs, and MACCS
fingerprints, and the traditional 1D and 2D molecular descriptors, as well as 3D molecular
descriptors, separately. The statistical evaluation of the model performances was
based on 18 different performance parameters. Although all the developed models were
close to the usual performance of QSPR models for each specific ADME-Tox target, the
results clearly showed the superiority of the traditional 1D, 2D, and 3D descriptors
in the case of the XGBoost algorithm. It is worth trying the classical tools in single
model building because the use of 2D descriptors can produce even better models for
almost every dataset than the combination of all the examined descriptor sets.