The growing body of sequence data offers new possibilities to find new bacterial quorum
sensing (QS) genes. Here we outline a method that allows one to extract sequence as
well as chromosomal position patterns from known QS genes that can be used to find
similar gene arrangements in unannotated sequence data.
Quorum sensing signaling is an autocrine signaling mechanism present in various unicellular
organisms, mainly in bacteria. This mechanism is based on an incoherent feed forward
network that, by definition consists of a positive feedback and negative feedback
control loop. The heart of this mechanism is the positive feedback or autoinducer
loop based on an enzymatic production of a signal molecule that binds to a sensor-receptor
molecule that upregulates the production of the signal as well as the expression of
various other genes. The “trick” of this mechanism is that the signal molecule can
leave from as well as return to the cell either by active or by passive transport.
The signal molecules outside the cell can influence the metabolism of other cells,
so there will be a de facto communication between cells. If signal concentration in
the extracellular space will be high enough, gene expression in all concerned cells
will be upregulated, so the functioning of the cell population will be synchronized.
This simple phenomenon allows a cell population to solve problems that individual
cells cannot tackle, such as the colonization of a surface or infecting a host organism.
On the other hand, the negative feedback loop plays a stabilizing role that does not
allow signal production to grow without limits.
As an example, luxR genes encode a transcriptional regulator that control acyl homoserine
lactone-based quorum sensing (AHL QS) in many Gram negative bacteria. In this system
the AHL signal travels through the bacterial cell wall via passive transport. On the
bacterial chromosome, luxR genes are usually in the direct vicinity of a luxI gene
encoding the AHL signal synthase. Genes involved in stabilization are often in between
these two genes or are located next to them. Over 15 operon types were observed in
the AHL signaling family alone.
Another well studied QS mechanism in Gram positive bacteria is the comQXPA locus in
Bacillus subtilis and related bacteria which encodes a QS system consisting of 4 genes.
Here the signal is a peptide that is transported across the membrane via active transport.
The peptide in the extracellular space is sensed by the extracellular part of a transmembrane
receptor ComP. The intracellular part of the receptor is a histidine kinase which
will phosphorylate a DNA-binding protein ComA. Once phosphorylated, ComA will bind
to the chromosome and upregulate the production of a ComX protein that includes the
peptide signal. This protein will then be cleaved by a transmembrane protein ComQ
that will pump out the peptide signal into the extracellular space. The autoinducer
loop of this system thus consists of 4 proteins and includes active transport, in
contrast to the AHL system where the autoinducer loop consists of only two proteins
and is based on passive transport. On the other hand, the topologies of the comQXPA
genes are quite conserved, minor differences occur only in the overlap of the concerned
A preliminary overview of the current literature revealed about 20 further well studied
quorum sensing systems. Comprehensive sequence collections were published on the AHL
and the comQXPA systems approximately 5 years ago, but the body of available bacterial
sequences has grown about 10 fold in the meantime so a survey of new data is an important
task. The challenge of such a survey is the variability of the QS systems. Importantly,
we can safely detect only the known, and let’s add, well known QS systems. The strategy
tries to generalize the logics of our previous surveys, i.e. a QS system is considered
as a generalized structure of entities and relationships which constitutes a graph
in which protein coding genes are the nodes and intergenic distances are the edges.
In order to detect such a gene set within the chromosome we need to apply parsimonious
and scalable solutions because the number sequences to be screened is now many millions
and the number is exponentially growing.
One of the problem complicating the situation are the truncated topologies, i.e. operons
where on ore more members are missing. For instance, AHL operons contain only two
genes, luxI and luxR but many bacteria contain solo luxR genes, i.e. receptor genes
with no signal synthase. Solo luxR genes are even more frequent than complete AHL
operons. This can have many reasons. For instance, the survey did not pick up the
adjacent luxI homolog either because its sequence is too divergent or because it is
further away within the chromosome. Or the luxR gene may control and unknown type
of signal synthase (as it was found in a few cases). Or the solo luxR protein responds
to an unknown type of a signal. With more complex operons the situation is even more
complicated, so we designed a search algorithm that employs a hierarchy of search
space reduction steps which is based on a hierarchy of molecular descriptions, namely
presence-absence, composition and full structure descriptions. For instance, if we
have an operon of four members, we will keep only genomes where at least three of
the components are present. From these we concentrate on genomes where the required
number of elements are present within a certain distance, a value observed in the
known instances of the operon. Finally establish the gene distances and write down
the topology found. This is a highly efficient space reduction strategy since the
compute/intensive steps are limited to very few cases.
An additional challenge is the recent surge in the number of next generation sequencing
data obtained on various bacterial systems. Such data, such as those present in the
NCBI SRA archive consist of many million reads each, and it is important to know whether
or not QS systems are present or are active in them. The collection of QS genes developed
within our project will be a useful tool for detecting such genes directly from reads
and from metagenomics datasets.
The long term goal of this project is develop automated protocols to extract QS genes
from genomic data. The output would be the topological description of the QS operons
along with a quality indicator characterizing the reliability of the prediction. For
the predictions we will use Hidden Markov Models as well as fast sequence compassion
programs (Bowtie or BWA) ported to multicore architectures such as GPU and FPGA which
will be used as search engines of dedicated web servers.