“When I first applied to the computer science (CS) Ph.D. program at Rice, my interest was in phylogenetic modeling – specifically from the graphics theory perspective,” said alumnus Hyun Jung Park. “I was interested in improving phylogenetic models to accurately reflect complex events such as incongruence between gene trees and species trees.
Park said his initial interest was in developing the models themselves, then he realized he was growing more intrigued by the systems the models were used to study. He said, “That shift was natural for me, because Luay [Nakhleh] urged us to think of the problems we looked into –not just from the mechanics of solving the problem, but the actual problem the scientists were trying to solve. Don’t just focus on the method, focus on the goal.
“During my Ph.D. training, my interest began shifting after taking several seminar classes about human cancer from the Texas Medical Center (TMC). One of the hot topics in the cancer field was how the tumor heterogeneity developed, and so my interest turned to understanding tumor heterogeneity. I wondered how we could better understand tumor heterogeneity if we were using better models based on phylogenetic models.”
Although Park’s area of research shifted, his career plans did not. He was determined to secure a tenure-track faculty position in a university with a medical center, so he spent several years building his credentials as a post-doctoral fellow working in the lab of Wei Li in the Dan L. Duncan Cancer Center at Baylor College of Medicine.
In 2018, Park began a tenure-track position as an assistant professor in Human Genetics Department at the University of Pittsburgh, and he attributes his success to both Li’s and Nakhleh’s training. He said, “One of the great advantages of the CS Ph.D. program at Rice is how it helps you find out for yourself what you are best at and how you can best put your skills and strengths to use. Its close proximity to TMC is another advantage of the Ph.D. program at Rice. There are many biologists in TMC with interesting hypotheses, and I was interested in developing computational tools for them, but in a way that would allow other scientists to use the tool to test their own hypotheses in their context or with their own data sets. During my postdoctoral training, Wei provided me with opportunities to do that.”
He explained that biologists have their own interests, like particular types of human cancer. “If we develop a model for breast cancer, can it then be used for studying ovarian cancer or other types of cancer? I’d like for scientists to be able to run their data and test their hypotheses. That’s the typical trend for computer scientists in the medical industry now, to develop computational models for a broader use.”
Park thinks computer scientists will play bigger roles in the medical industry. He said the amount of biological big data that is becoming more available these days has prompted scientists to begin merging deep knowledge of biology using quantitative machine learning techniques, to develop new biological hypotheses on a global scale.
“Traditionally, hypotheses developed individually–by biologists who are experts in their domains– have formed the basis for biological research activities. But now we can use data mining skills and deep learning algorithms to generate biology hypotheses. Then, the biological hypotheses generated based on big data can account for extensive interactions between biology entities, which should be taken into account.
“For example, a traditional biological hypothesis may state that down-regulation of Gene A may promote or inhibit a particular disease. In contrast to that, using data mining techniques, a new hypothesis might state that a class of genes (1000+) are affecting another class of genes, collectively promoting a particular disease.”
That same advent of technology helps Park and other scientists capture the global landscape of RNA changes. He said he chose to study RNA dynamics because it can be modeled in the simplest possible way. RNAs interact together in a straightforward way, similar to the obvious connections in a toddler’s first puzzle.
Other biological components such as transcription factor binding would involve multiple interactions and influencers, resulting in complex interactions, more like trying to connect circuits than a child’s puzzle pieces.
“But with RNAs, we don’t have to know the rest of the story,” said Park. “They either interact, or they don’t. So far, scientists have been focusing their hypotheses on a small number of RNAs – maybe only a few out of thousands, or tens out of 1000s– and how those interactions affect phenotypes or whatever. Then there are other scientists critiquing their findings and saying that on a level, this small percentage of RNAs would not make any significant difference in biological systems.
“That made sense to me. Lining up 100 to 100 and modifying a limited number – let’s say three – of RNAs on one side shouldn’t affect the other side because the remaining 97 RNAs are the ones influencing the outcome of the interaction.
Then I realized that my CS skills should come into play, to better understand these interactions. CS skills that can be used to solve intractable problems, can effectively illustrate the interaction in a meaningful and efficient way.”
Park’s said his Ph.D. training and CS skills are expected to play a role in research into abnormalities in RNA levels that cause disease, promote cancer, and so forth. In one of his works that has recently been published in Nature Genetics, he built an integrated network followed by statistical modeling. Park expects to make comprehensive predictions on the tumorigenic process in his subsequent work.
“To see what is going on in the RNA world, we can use a computational simulation. Using computational simulations, we can crunch more than 1000 factors simultaneously,” said Park.
“But these days, enormous amounts of data are being collected in many fields beyond biology. It’s already impacting the study of psychology, political science, sociology, economy, and other disciplines. Because of the amount of data, there are now so many things computer scientists can get involved in.
“My advice to other CS Ph.D. students is to keep developing your interests, not only in the methodology, but also on the kinds of problems left to be solved across diverse fields.”