Isabel Armour-Garb will present her General Exam "Leveraging Protein Language Models for Protein Characterization" on Friday, May 3, 2024 at 10:00 AM in Icahn 280 .
Isabel Armour-Garb will present her General Exam "Leveraging Protein Language Models for Protein Characterization" on Friday, May 3, 2024 at 10:00 AM in Icahn 280 Committee Members: Mona Singh (advisor), Olga Troyanskaya, Yuri Pritykin Abstract: Genomes are sequenced much faster than experimental efforts to functionally characterize the proteins they contain. Computational methods based on sequence similarity are the first line of attack for annotating sequences with protein function. However, no functional annotations exist for over 1.1 billion protein sequences, and thousands of human proteins lack sufficient characterization. To understand protein function, it is important to identify functional sites (e.g. amino acids involved in crucial interactions such as catalysis or post-translational modifications), which are highly conserved across homologs. However, for proteins with few known homologs, current methods cannot predict evolutionary conservation, because state-of-the-art methods rely on the alignment of homologs. Current methods align these homologs in a multiple sequence alignment to find evolutionarily conserved positions. Protein language models (PLMs) are advanced deep learning natural language processing models specifically designed to understand and generate protein-related texts and sequences. These models represent each amino acid within a protein sequence as a high-dimensional vector, effectively capturing complex information about the amino acids and their sequence context, and providing a way to encode and process the inherent features of the protein as an embedding for computational analysis. First, we introduce vcMSA, a new method for more accurate multiple sequence alignment using PLMs. Then, leveraging PLMs, we introduce a new method for accurately predicting functional sites given a single protein sequence, which allows us to better characterize proteins of unknown function regardless of the presence or absence of known homologs. Reading List: https://docs.google.com/document/d/1thOoMwpz7H1JrgQKYH9Va-Xf5ZyEQnJV-11VeNBh... Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.
participants (1)
-
CS Grad Department