Clinical usefulness of scoring systems to predict severe acute pancreatitis: A systematic
review and meta-analysis with pre and post-test probability assessment
Scoring systems for severe acute pancreatitis (SAP) prediction should be used in conjunction
with pre-test probability to establish post-test probability of SAP, but data of this
kind are lacking.To investigate the predictive value of commonly employed scoring
systems and their usefulness in modifying the pre-test probability of SAP.Following
PRISMA statement and MOOSE checklists after PROSPERO registration, PubMed was searched
from inception until September 2022. Retrospective, prospective, cross-sectional studies
or clinical trials on patients with acute pancreatitis defined as Revised Atlanta
Criteria, reporting rate of SAP and using at least one score among Bedside Index for
Severity in Acute Pancreatitis (BISAP), Acute Physiology and Chronic Health Examination
(APACHE)-II, RANSON, and Systemic Inflammatory Response Syndrome (SIRS) with their
sensitivity and specificity were included. Random effects model meta-analyses were
performed. Pre-test probability and likelihood ratio (LR) were combined to estimate
post-test probability on Fagan nomograms. Pooled severity rate was used as pre-test
probability of SAP and pooled sensitivity and specificity to calculate LR and generate
post-test probability. A priori hypotheses for heterogeneity were developed and sensitivity
analyses planned.43 studies yielding 14,116 acute pancreatitis patients were included:
42 with BISAP, 30 with APACHE-II, 27 with Ranson, 8 with SIRS. Pooled pre-test probability
of SAP ranged 16.6%-25.3%. The post-test probability of SAP with positive/negative
score was 47%/6% for BISAP, 43%/5% for APACHE-II, 48%/5% for Ranson, 40%/12% for SIRS.
In 18 studies comparing BISAP, APACHE-II, and Ranson in 6740 patients with pooled
pre-test probability of SAP of 18.7%, post-test probability when scores were positive
was 48% for BISAP, 46% for APACHE-II, 50% for Ranson. When scores were negative, post-test
probability dropped to 7% for BISAP, 6% for Ranson, 5% for APACHE-II. Quality, design,
and country of origin of the studies did not explain the observed high heterogeneity.The
most commonly used scoring systems to predict SAP perform poorly and do not aid in
decision-making.