Posted: October 17, 2011
CASE STUDY: Using an Algorithm to Find “Driver” Mutations
Emma J. Spaulding
“Don’t be shy.”
That’s the advice Ben Raphael, Ph.D. would give a young scientist. Following this advice has led him to his position today, an Associate Professor in the Department of Computer Science and the Center for Computational Molecular Biology at Brown University, an investigator for The Cancer Genome Atlas (TCGA) and winner of the 2011 National Science Foundation CAREER Award. He has taken opportunities as they come and sometimes, even created his own.
While Dr. Raphael was working as a postdoc at the University of California, San Diego, examining differences in large-scale genome rearrangements between humans, mice and rats, his mentor, Pavel Pevzner, Ph.D. announced to their lab group that he had received an email requesting help characterizing large-scale rearrangements in cancer. “One day, in one of our group meetings, Pavel said, 'I was contacted by these cancer biologists who have some data. Their project is something with cancer and rearrangements. Is anyone interested in looking at this?'” Dr. Raphael volunteered and began working with prostate cancer researcher, Colin Collins, Ph.D., the inventor of modern paired-end sequencing's precursor. Through this collaboration, Dr. Raphael began shifting his interest from comparative genomics to rearrangements in cancer.
Going to Cancer Research, Without Leaving Mathematics Behind
Later in his career, Dr. Raphael was running his own lab at Brown University. In 2009, the Raphael lab joined the TCGA analysis working group to provide their expertise in large-scale rearrangements in ovarian cancer. However, Dr. Raphael’s previous training during his mathematics Ph.D. had taught him to look at the “big picture.” He says, “As a mathematician you’re trained to be a generalist, to find the general problem.”
Listening to other group member’s presentations, Dr. Raphael identified a theme in their problems.
“When [they] looked at the mutation data… and tried to find the genes that were recurrently mutated and were potentially the important cancer driver genes, it was hard because there was so much heterogeneity and so many differences between individual tumors,” he explained. The amount of data coming from TCGA is vast. To draw an analogy, imagine all of the TCGA data as a beach. The lightest, clearest grains of sand are mutations. Among all of the mutations, one perturbs a pathway that causes cancer. It’s like looking for a small diamond among all the light, clear grains of sand. In the beach metaphor, the problem is clear. There was too much diversity in too many mutations. This made them difficult to classify and analyze.
“I thought there was a need for perhaps another way of looking at the data,” Dr. Raphael says, “So that’s what we developed in our group: an algorithm for looking at groups of genes that were mutated at a significant frequency.” He thought the best method to organize the disparate data was to group mutated genes and try to find the “driver” pathways. “Driver” pathways are the series of protein interactions that, if mutated, potentially could cause cancer. “Passenger” pathways are pathways that are mutated, but are just “along for the ride.”
In order to sift through all the mutations to find the drivers, Dr. Raphael and his group created the HotNet algorithm. This algorithm was developed with several considerations in mind. Tumors rarely share the same mutations, and it is known that many different mutations can disturb key pathways. There is also crosstalk between pathways. Dr. Raphael wanted HotNet to identify most frequently mutated sub-networks.
Putting HotNet to the Test
In order to test HotNet, Dr. Raphael applied the algorithm to the TCGA GBM dataset. From that dataset, the algorithm extracted the subnetworks of highly mutated genes. Those subnetworks were compared with genes in pathways that were previously implicated in GBM, and there was significant overlap. HotNet was able to match the results from TCGA’s GBM paper. Through this analysis, Dr. Raphael was first to demonstrate a computationally efficient strategy for new identification of statistically significant mutated subnetworks. The findings were detailed in a landmark paper in the Journal of Computational Biology1. Now that HotNet was a proven algorithm, it was applied to the TCGA ovarian cancer dataset, where it was successfully used to identify significantly mutated portions of the protein-protein interaction network.
Between the expansion of projects such as TCGA and the drop in sequencing costs, available data will continue to multiply. As a consequence, HotNet will become an even more valuable research tool. According to Dr. Raphael, “It’s becoming straightforward to make a list of mutations in any individual genome and, now it’s about finding which ones are the cancer mutations.” HotNet need not be limited to distinguishing driver and passenger mutations and pathways. The algorithm could be extended to examine additional data types or interaction types. In the future, Dr. Raphael envisions an algorithm that extracts the causative mutations from a patient’s genome and translates that information into a specialized drug combination, designed for that patient. “[Cancer] is a really complex host of diseases. Not only do we need a lot of data but we need the right algorithms and the right methods."
1 Vandin, F., Upfal, E. and Raphael, B.J. (2011) Algorithms for detecting significantly mutated pathways in cancer. J Comput Biol. 18(3): 507-522.