Lendület’’ Program of the Hungarian Academy of Sciences(LP2015-2/2015)
Nemzeti Kutatási, Fejlesztési és Innovációs Hivatal(K135561)
European Research Council Starting(715043)
Model selection is often implicit: when performing an ANOVA, one assumes that the
normal distribution is a good model of the data; fitting a tuning curve implies that
an additive and a multiplicative scaler describes the behavior of the neuron; even
calculating an average implicitly assumes that the data were sampled from a distribution
that has a finite first statistical moment: the mean. Model selection may be explicit,
when the aim is to test whether one model provides a better description of the data
than a competing one. As a special case, clustering algorithms identify groups with
similar properties within the data. They are widely used from spike sorting to cell
type identification to gene expression analysis. We discuss model selection and clustering
techniques from a statistician's point of view, revealing the assumptions behind,
and the logic that governs the various approaches. We also showcase important neuroscience
applications and provide suggestions how neuroscientists could put model selection
algorithms to best use as well as what mistakes should be avoided.