Chaitanya Aluru will present his FPO "Reconciliation-Based Methods for Identifying the Evolutionary Origins of Tandem Duplications in Repeat Domain Families" on Friday, 12/17/2021 at 1:30PM via Zoom and CS 105
Chaitanya Aluru will present his FPO "Reconciliation-Based Methods for Identifying the Evolutionary Origins of Tandem Duplications in Repeat Domain Families" on Friday, 12/17/2021 at 1:30PM via Zoom and in CS 105 Zoom link: https://princeton.zoom.us/j/99860934740 The members of his committee are as follows: Mona Singh (Adviser), Readers: Ben Raphael, Bernard Chazelle, Mona Singh; Examiners: Mona Singh, Olga Troyanskaya, and Barbara Engelhardt A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis. Everyone is invited to attend the talk. Abstract follows below: Domains are the structural, functional, and evolutionary building blocks of protein sequences. Proteins can contain multiple domain instances, and duplications and losses of these domains are a key driver of protein evolution. Of particular interest are families of proteins with consecutive repeats of the same domain. These tandem repeat families are involved in a wide variety of functions, including transcriptional regulation, protein transport, muscle contraction, brain size regulation, and many others. Proteins with tandemly repeated domains form a significant portion of the proteome across the tree of life. Despite their prevalence and importance, the evolutionary histories and functional diversification of many of these protein families are largely unknown. Understanding when domains duplicate, whether individually or together as part of an array of domains, could yield deeper insights into the functions of these proteins. Several attempts have been made to understand the evolution of repeat domains within protein sequences. These approaches can largely be categorized into sequence-based and reconciliation-based methods. Sequence-based approaches attempt to identify the existence of tandem duplications, without placing them in an evolutionary context. Reconciliation based methods, on the other hand, use gene and domain trees to simultaneously infer both tandem duplication events and the genes they occurred in. These methods, while more powerful, have not accurately captured tandem duplication events. In this work, we bridge the gap between these two methods, developing reconciliationbased methods that can accurately identify tandem domain duplication events while also placing them correctly in the evolutionary history of their gene families. We extend existing reconciliation frameworks to include flexible cost models for duplication events. Rather than fixed costs regardless of duplication size, we represent costs as arbitrary functions of duplication length. We tackle the problem of distinguishing tandem duplications from other duplication events by incorporating sequence position information from existing domains. We provide both exact solutions and fast, accurate heuristics to these problems. Finally, we apply these approaches to the largest repeat domain family in humans, the Cys2-His2 zinc fingers. In analysis of 494 Cys2-His2 zinc finger orthogroups, we find evidence of numerous tandem domain duplications throughout the placental mammals.
Chaitanya Aluru will present his FPO "Reconciliation-Based Methods for Identifying the Evolutionary Origins of Tandem Duplications in Repeat Domain Families" on Friday, 12/17/2021 at 1:30PM via Zoom and in CS 105 Zoom link: https://princeton.zoom.us/j/99860934740 The members of his committee are as follows: Mona Singh (Adviser), Readers: Ben Raphael, Bernard Chazelle, Mona Singh; Examiners: Mona Singh, Yury Pritykin, and Barbara Engelhardt A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis. Everyone is invited to attend the talk. Abstract follows below: Domains are the structural, functional, and evolutionary building blocks of protein sequences. Proteins can contain multiple domain instances, and duplications and losses of these domains are a key driver of protein evolution. Of particular interest are families of proteins with consecutive repeats of the same domain. These tandem repeat families are involved in a wide variety of functions, including transcriptional regulation, protein transport, muscle contraction, brain size regulation, and many others. Proteins with tandemly repeated domains form a significant portion of the proteome across the tree of life. Despite their prevalence and importance, the evolutionary histories and functional diversification of many of these protein families are largely unknown. Understanding when domains duplicate, whether individually or together as part of an array of domains, could yield deeper insights into the functions of these proteins. Several attempts have been made to understand the evolution of repeat domains within protein sequences. These approaches can largely be categorized into sequence-based and reconciliation-based methods. Sequence-based approaches attempt to identify the existence of tandem duplications, without placing them in an evolutionary context. Reconciliation based methods, on the other hand, use gene and domain trees to simultaneously infer both tandem duplication events and the genes they occurred in. These methods, while more powerful, have not accurately captured tandem duplication events. In this work, we bridge the gap between these two methods, developing reconciliationbased methods that can accurately identify tandem domain duplication events while also placing them correctly in the evolutionary history of their gene families. We extend existing reconciliation frameworks to include flexible cost models for duplication events. Rather than fixed costs regardless of duplication size, we represent costs as arbitrary functions of duplication length. We tackle the problem of distinguishing tandem duplications from other duplication events by incorporating sequence position information from existing domains. We provide both exact solutions and fast, accurate heuristics to these problems. Finally, we apply these approaches to the largest repeat domain family in humans, the Cys2-His2 zinc fingers. In analysis of 494 Cys2-His2 zinc finger orthogroups, we find evidence of numerous tandem domain duplications throughout the placental mammals. _______________________________________________ talks mailing list talks@lists.cs.princeton.edu To edit subscription settings or remove yourself, use this link: https://lists.cs.princeton.edu/mailman/listinfo/talks
participants (1)
-
Nicki Mahler