Ensemble-based structural modeling of flexible protein segments such as intrinsically
disordered regions is a complex task often solved by selection of conformers from
an initial pool based on their conformity to experimental data. However, the properties
of the conformational pool are crucial, as the sampling of the conformational space
should be sufficient and, in the optimal case, relatively uniform. In other words,
the ideal sampling is both efficient and exhaustive. To achieve this, specialized
tools are usually necessary, which might not be maintained in the long term, available
on all platforms or flexible enough to be tweaked to individual needs. Here, we present
an open-source and extendable pipeline to generate initial protein structure pools
for use with selection-based tools to obtain ensemble models of flexible protein segments.
Our method is implemented in Python and uses ChimeraX, Scwrl4, Gromacs and neighbor-dependent
backbone distributions compiled and published previously by the Dunbrack lab. All
these tools and data are publicly available and maintained. Our basic premise is that
by using residue-specific, neighbor-dependent Ramachandran distributions, we can enhance
the efficient exploration of the relevant region of the conformational space. We have
also provided a straightforward way to bias the sampling towards specific conformations
for selected residues by combining different conformational distributions. This allows
the consideration of a priori known conformational preferences such as in the case
of preformed structural elements. The open-source and modular nature of the pipeline
allows easy adaptation for specific problems. We tested the pipeline on an intrinsically
disordered segment of the protein Cd3ϵ and also a single-alpha helical (SAH) region
by generating conformational pools and selecting ensembles matching experimental data
using the CoNSEnsX+ server.