(2020-1.1.2-PIACI-KFI-2021-00298) Támogató: Nemzeti Kutatás, Fejlesztés és Innovációs
Iroda
Mesterséges Intelligencia Nemzeti Laboratórium / Artificial Intelligence National
Laboratory(MILAB) Támogató: NKFIH
(RRF-2.3.1-21-2022-00006) Támogató: Egészségbiztonság Nemzeti Laboratórium
(K128780) Támogató: Nemzeti Kutatás, Fejlesztés és Innovációs Iroda
(Open access funding provided by Semmelweis University)
Statistical learning algorithms strongly rely on an oversimplified assumption for
optimal performance, that is, source (training) and target (testing) data are independent
and identically distributed. Variation in human tissue, physician labeling and physical
imaging parameters (PIPs) in the generative process, yield medical image datasets
with statistics that render this central assumption false. When deploying models,
new examples are often out of distribution with respect to training data, thus, training
robust dependable and predictive models is still a challenge in medical imaging with
significant accuracy drops common for deployed models. This statistical variation
between training and testing data is referred to as domain shift (DS).To the best
of our knowledge we provide the first empirical evidence that variation in PIPs between
test and train medical image datasets is a significant driver of DS and model generalization
error is correlated with this variance. We show significant covariate shift occurs
due to a selection bias in sampling from a small area of PIP space for both inter
and intra-hospital regimes. In order to show this, we control for population shift,
prevalence shift, data selection biases and annotation biases to investigate the sole
effect of the physical generation process on model generalization for a proxy task
of age group estimation on a combined 44 k image mammogram dataset collected from
five hospitals.We hypothesize that training data should be sampled evenly from PIP
space to produce the most robust models and hope this study provides motivation to
retain medical image generation metadata that is almost always discarded or redacted
in open source datasets. This metadata measured with standard international units
can provide a universal regularizing anchor between distributions generated across
the world for all current and future imaging modalities.