Research in Focus: Needles, Haystacks, and Sugar Chains

Dukka KC, Department of Computer Science

From the 2024 Michigan Tech Magazine

A glycobiologist and a computer scientist join forces to find protein-bound glycans faster—a major advance in the understanding of human health and disease.

Shortly after Dukka KC arrived on campus in fall 2021, the new professor of computer science emailed Michigan Tech glycobiologist Tarun Dam. The note, as Dam remembers it, said in essence, “I see that you’re a glycobiologist. I do some bioinformatics work with glyco-molecules. Maybe we can work together someday.”

The two soon had lunch. They found out they come from neighboring countries in Asia, and became fast friends. “We talked for hours,” remembers KC. “Tarun was so great and so enthusiastic. His passion is contagious! He loves glycobiology.”

As strong as their friendship has become, the foundation of their professional collaboration, according to Dam, is the science. “After getting to know Dukka and the work that he does, I was very impressed. I thought it was a really great opportunity to collaborate.”

But how does the field of bioinformatics, which uses advanced computer technology to collect and analyze biological data, apply to glycobiology—the study of the structure and biological function of carbohydrates? The answer requires a brief science lesson.

Proteins are chains of amino acids. These chains can be 2,000 or more amino acids in length, but they compact themselves to conserve space. In order to function, protein chains require additional attachments—what science calls post-translational modifications, or PTMs.

Dam eloquently refers to them as ornaments.

Many of these ornaments are sugar molecules, or glycans, and they attach themselves to protein chains through a process known as glycosylation. This process can only occur on specific amino acids, and only when they’re linked together in certain combinations. Sometimes, though, even the correct sequence of amino acids will not have the expected ornament arrangement—a situation Dam calls “a ridiculous problem to have,” especially when knowing the location of these glycans is essential to understanding health and disease.

“When people get cancer, their sugar ornamentation pattern often changes,” says Dam. “So scientists need to know which amino acids in a protein are occupied by glycan ornaments, which ones are empty, and when the pattern changes are happening.” Traditionally, glycobiologists have done this work experimentally in the lab. “We cut a protein, we identify the sugar chains, and then we write a paper. That’s a difficult job. It takes expertise. It takes money. It takes time. Sometimes it takes years to produce results to understand the whole picture.”

“What we do, what we are—it’s all because of proteins.”

Tarun Dam, Department of Chemistry

professor of chemistry; director of the biochemistry and molecular biology bachelor’s program

When Dam got KC’s email, he saw the opportunity to use the power of machine learning to make the process of glycobiology more efficient.

Working together with collaborators from Wichita State University, Kansas State University, the University of Houston, and Soka University in Tokyo, Japan, Dam and KC began studying the glycosylation of an amino acid known as asparagine. Glycans that attach to asparagine are called N-linked glycans, and they can only attach to asparagine if it has two specific amino acids on its right-hand side. The first can be any of the 20 common amino acids except proline, and the second must either be either serine or threonine.

With funding from the National Science Foundation, Dam and KC and their collaborators developed LMNglyPred, a deep-learning-based approach to predict N-linked glycosylated sites in human proteins using embeddings from a pre-trained language model.

“Now, without even doing any experiment, I can go to a computer scientist like Dukka, and if I give them a known protein, they can predict and say, ‘Okay, you have five asparagines in that protein. Three will have sugar chains in this location. Two will not.’ Without any experiment! That’s the power of machine learning and of this collaboration.”

Like all good teachers, KC explains his part of the project using a familiar metaphor.

“We cannot for sure say, ‘This is where the needle is,'” KC says. “But we can really narrow that search space, so that instead of looking at everything, we can just search the area where it’s most likely going to be.”

“In bioinformatics, we often say we are not trying to find a needle in a haystack. We are trying to find the few spots in the haystack where needles are most likely going to be.”

Dukka KC, College of Computing, Department of Computer Science

KC says the fields of computer science and glycobiology have been collaborating for nearly two decades, but using deep learning tools and large language models in this work is very new—so new that KC thinks he, Dam, and their collaborators may be the first to have used a language model to predict glycosylation. The implications for glycobiologists like Dam—and for the medical field as a whole—are potentially immense.

Yet for all the giant leaps made by large language models and other artificial intelligence tools in recent years, KC is quick to acknowledge the foundation on which his work rests.

“It’s a loop, right?” says KC. “Experimentalists like Tarun generate all this data, which we use to train our model. We then use the protein language model to inform more and better experiments. So their data helps our model get better, and our model helps their experiments get better. It’s a loop.”

Dam says he and KC have “huge ideas” for further collaboration on proteins—specifically those with biomarkers for cancer.

“Tarun only cares about sugars,” KC says with a chuckle, teasing Dam like an old friend. “Not to diminish anything about sugar, but there are 400 post-translational modifications, and glycosylation is only one of those 400. But when I talk to Tarun, he makes it sound like that ornament is the only important thing in the world.”

Dam gives the good-natured ribbing right back to KC. “Yes, yes, he studies the other ornaments, too,” Dam says with mock dismissal. “But no other modifications affect 70 percent of proteins. Only glycosylation affects the protein from birth to death. I try to convince Dukka to do more with glycobiology. It’s so vast, and so important.”

KC laughs, then concedes that he and Dam are considering exploring the “cross-talk between phosphorylation and O-GlcNAc.” His explanation of the importance of these glycosylations to biology and human health elicits a nod of admiration from his friend and colleague.

“Dukka is not a glycobiologist, but he understands the significance of it almost like he is one,” says Dam. “It’s fun working with him. We respect each other’s expertise. Both of our labs are doing work that is significant, and that significance will only grow as our collaboration continues.”

KC agrees. “We have other interests of our own, of course, but we found some common interests—these sugars that kind of bind us.”

Michigan Technological University is a public research university founded in 1885 in Houghton, Michigan, and is home to more than 7,000 students from 55 countries around the world. Consistently ranked among the best universities in the country for return on investment, Michigan’s flagship technological university offers more than 120 undergraduate and graduate degree programs in science and technology, engineering, computing, forestry, business and economics, health professions, humanities, mathematics, social sciences, and the arts. The rural campus is situated just miles from Lake Superior in Michigan’s Upper Peninsula, offering year-round opportunities for outdoor adventure.