From Sequence to Significance: Machine learning for functional prioritization of Post-Translational Modifications

Nolan English
School of Biological Sciences
Advisor: Dr. Matthew Torres (School of Biological Sciences)

Committee Members:
Dr. Melissa Kemp, School of Biomedical Engineering; Georgia Institute of Technology
Dr. Raquel Lieberman, School of Chemistry and Biochemistry; Georgia Institute of Technology
Dr. Peng Qiu, School of Biomedical Engineering; Georgia Institute of Technology
Dr. Christopher Rozell, School of Electrical and Computer Engineering; Georgia Institute of Technology

Abstract:
Post-translational modifications (PTMs) provide an extensible framework for regulation of protein behavior beyond the diversity represented within the genome alone. While the rate of identification of PTMs has rapidly increased in recent years, our knowledge of PTM functionality remains limited. Fewer than 4% of all eukaryotic PTMs are reported to have biological despite their ubiquity across the proteome. This percentage continues to decrease as the pace of identification of PTMs surpasses the rate that PTMs are experimentally researched. To bridge the gap between identification and interpretation we have developed SAPH-ire, Structural Analysis of PTM Hotspots, a machine learning based tool for prioritizing PTMs for experimental study by functional potential. In this thesis, I aim to expand SAPH-ire’s functionality to predict potential function and improve its performance in ranking PTMs by functional potential. Here I will first discuss some challenges facing computational PTM research from an informatics perspective. I will then discuss the creation of new resources to address these challenges in four objectives. First, the creation of a new data resource that captures experimental data from mass spectrometry experiments designed to focus on PTMs. Second, the renovation of the SAPH-ire machine learning model to improve model performance and predictive recall. Third, the generation of a new model capable of discerning function from functional potential and structural features. Fourth, the development of a visual interface for SAPH-ire and the data resource that enhance one’s ability to understand the model’s results and drives further study.

Post-translational modifications (PTMs) provide an extensible framework for regulation of protein behavior beyond the diversity represented within the genome alone. While the rate of identification of PTMs has rapidly increased in recent years, our knowledge of PTM functionality remains limited. Fewer than 4% of all eukaryotic PTMs are reported to have biological despite their ubiquity across the proteome. This percentage continues to decrease as the pace of identification of PTMs surpasses the rate that PTMs are experimentally researched. To bridge the gap between identification and interpretation we have developed SAPH-ire, Structural Analysis of PTM Hotspots, a machine learning based tool for prioritizing PTMs for experimental study by functional potential. In this thesis, I aim to expand SAPH-ire’s functionality to predict potential function and improve its performance in ranking PTMs by functional potential. Here I will first discuss some challenges facing computational PTM research from an informatics perspective. I will then discuss the creation of new resources to address these challenges in four objectives. First, the creation of a new data resource that captures experimental data from mass spectrometry experiments designed to focus on PTMs. Second, the renovation of the SAPH-ire machine learning model to improve model performance and predictive recall. Third, the generation of a new model capable of discerning function from functional potential and structural features. Fourth, the development of a visual interface for SAPH-ire and the data resource that enhance one’s ability to understand the model’s results and drives further study.

Post-translational modifications (PTMs) provide an extensible framework for regulation of protein behavior beyond the diversity represented within the genome alone. While the rate of identification of PTMs has rapidly increased in recent years, our knowledge of PTM functionality remains limited. Fewer than 4% of all eukaryotic PTMs are reported to have biological despite their ubiquity across the proteome. This percentage continues to decrease as the pace of identification of PTMs surpasses the rate that PTMs are experimentally researched. To bridge the gap between identification and interpretation we have developed SAPH-ire, Structural Analysis of PTM Hotspots, a machine learning based tool for prioritizing PTMs for experimental study by functional potential. In this thesis, I aim to expand SAPH-ire’s functionality to predict potential function and improve its performance in ranking PTMs by functional potential. Here I will first discuss some challenges facing computational PTM research from an informatics perspective. I will then discuss the creation of new resources to address these challenges in four objectives. First, the creation of a new data resource that captures experimental data from mass spectrometry experiments designed to focus on PTMs. Second, the renovation of the SAPH-ire machine learning model to improve model performance and predictive recall. Third, the generation of a new model capable of discerning function from functional potential and structural features. Fourth, the development of a visual interface for SAPH-ire and the data resource that enhance one’s ability to understand the model’s results and drives further study.

Event Details

Date: 
Friday, October 12, 2018 - 12pm

Location:
Ford Environmental Science & Technology Building, Room L1118