Henri Schmidt will present his General Exam "Combinatorial models for reconstructing evolutionary histories" on Monday, May 8, 2023 at 11:00 AM in CS 301.

Committee Members: Ben Raphael (advisor), Yuri Pritykin, Olga Troyanskya

Abstract:

Reconstructing evolutionary histories is a fundamental problem in biology, studied across life's organizational hierarchy. Often, this evolutionary reconstruction task, referred to as phylogenetic inference, is framed as a combinatorial optimization problem under an appropriate evolutionary model. In this work, we examine two such phylogenetic inference problems, lineage tracing and the copy number tree problem, through the lens of combinatorial optimization.

CRISPR-Cas9 based genome editing combined with single-cell sequencing enables the tracing of the history of cell divisions, or lineage tracing, in tissues and whole organisms. While standard phylogenetic approaches may be applied to reconstruct cellular lineage trees from this data, the unique features of the CRISPR-Cas9 editing process motivate the development of specialized models that describe the evolution of CRISPR-Cas9 induced mutations. Here, we introduce the star homoplasy model, a novel evolutionary model that constrains a phylogenetic character to mutate at most once along a lineage, capturing the non-modifiability property of CRISPR-Cas9 mutations. We derive a combinatorial characterization of star homoplasy phylogenies by identifying a relationship between the star homoplasy model and the binary perfect phylogeny model. We use this characterization to develop an algorithm, Startle (Star tree lineage estimator), that computes a maximum parsimony star homoplasy phylogeny. We demonstrate that Startle outperforms other methods at lineage reconstruction on both real and simulated data.

Low-coverage single-cell DNA sequencing technologies enable the measurement of copy number profiles from thousands of individual cells within tumors. From this data, one can infer the evolutionary history of the tumor, referred to as a copy number phylogeny, by modeling transformations of the genome via copy number aberrations.  A widely used model to infer such copy number phylogenies is the copy number transformation (CNT) model. While the CNT model is useful, no efficient algorithm has been developed to find the most parsimonious phylogeny under the CNT model due to its difficult, combinatorial properties. Here, we introduce the zero-agnostic copy number transformation (ZCNT) model, a simplification of the CNT model that allows the amplification or deletion of genomic loci with zero copies. We use our simplified model to derive polynomial time algorithms for two natural relaxations of the small parsimony problem on copy number profiles. While the alteration of zero copy number regions allowed under the ZCNT model is not biologically realistic, we show on both simulated and real datasets that the ZCNT model is a close approximation to the CNT model. Extending our polynomial time algorithm for the ZCNT small parsimony problem, we develop an algorithm, Lazac, for solving the large parsimony problem on copy number profiles. We demonstrate that Lazac outperforms existing methods for inferring copy number phylogenies on both simulated and real data.

Reading List:

https://docs.google.com/document/d/1pHhFrh2M6rcrGaRpMyDlAvmB5l50sPs8Y-PHtIcJ4O4/edit?usp=sharing 

Everyone is invited to attend the talk, and those faculty wishing to remain for the oral exam following are welcome to do so.

Louis Riehl
Graduate Administrator
Computer Science Department, CS213
Princeton University
(609) 258-8014