High-content screening is a powerful method to discover new drugs and carry out basic
biological research. Increasingly, high-content screens have come to rely on supervised
machine learning (SML) to perform automatic phenotypic classification as an essential
step of the analysis. However, this comes at a cost, namely, the labeled examples
required to train the predictive model. Classification performance increases with
the number of labeled examples, and because labeling examples demands time from an
expert, the training process represents a significant time investment. Active learning
strategies attempt to overcome this bottleneck by presenting the most relevant examples
to the annotator, thereby achieving high accuracy while minimizing the cost of obtaining
labeled data. In this article, we investigate the impact of active learning on single-cell-based
phenotype recognition, using data from three large-scale RNA interference high-content
screens representing diverse phenotypic profiling problems. We consider several combinations
of active learning strategies and popular SML methods. Our results show that active
learning significantly reduces the time cost and can be used to reveal the same phenotypic
targets identified using SML. We also identify combinations of active learning strategies
and SML methods which perform better than others on the phenotypic profiling problems
we studied.