Decoding Bacterial Genomes

Decoding Bacterial Genomes

Shira Broschat and Ananth KalyanaramanIn order to understand what makes people tick we look at more than just their family histories. We look at their significant others, friends, and colleagues and consider how and why those relationships work. We also consider the ways they communicate.

The same is true of understanding bacteria, but analyzing relationships outside of the family history is more complicated at the microscopic level. Most current technology cannot effectively analyze genetic relationships between different species that could increase biologists’ understanding of bacteria that cause infection and antibiotic resistance.

Computer scientists at Washington State University are looking to change this.

With a three-year National Science Foundation grant, WSU professors Shira Broschat, Ananth Kalyanaraman, and Douglas Call are analyzing a group of approximately 18,000 genomes, the most ever attempted. Broschat and Kalyanaraman are professors from the School of Electrical Engineering and Computer Science, and Call is from the School for Global Animal Health.

The genomes come from the Proteobacteria community, one that includes many infection-causing bacteria like E. coli O157:H7 and Salmonella. The community presents two major challenges, one of scale and one of bacterial sex.

A good example of the bacterial sex problem is non-typhoidal Salmonella. The bacterium, which causes infections in humans and animals, does not keep antibiotic resistance in the family; instead it shares antibiotic resistance genes with different species via bacterial sex.

Such gene transfers mean that bacteria end up with genes that are not part of their evolutionary history, or phylogeny, which is how biologists currently analyze and categorize bacteria. Taking interspecies gene transfers into account could give biologists a more accurate and precise understanding of the functions of genes that affect traits such as a bacterium’s likelihood to cause infection or have antibiotic resistance.

That introduces the problem of scale.

In order to understand which genes are responsible for given traits, it’s important for scientists to be able to analyze different species of bacteria side by side to determine which genes they share. While current analysis techniques can examine a handful of genomes at a time, they cannot handle the millions of genomes within one community of bacteria.

“What we’re trying to do is not considered possible because
of the scalability,” says Kalyanaraman, EECS associate professor. “Large computers with high capacity and capabilities are required to process that much data.”

Kalyanaraman began developing software that allows analysis at large scales in 2008. Using supercomputers that are about 10,000 to one million times faster than the average desktop computer,
the software, called pClust, can cluster bacteria based on shared characteristics.

“We will end up with clusters of proteins that conceivably perform similar functions in the different species,” Broschat says, “and if the function of one gene in a cluster is known, it gives us clues as to the function of the genes in other organisms.”

Most biologists do not have access to the large processors required to do such clustering, so the WSU team is making their software compatible with cloud computing.

“Regardless of how powerful any software package is, if it isn’t usable by the people for whom it’s meant, then it may as well not exist,” Broschat said.

“This type of program in a cloud-based environment shields the user from the technicalities of high performance computing that is on the backend,” Kalyanaraman said. “Biologists should just be able to double click an icon on their desktop computer and access the database.”