Matthew Myers will present his FPO "Inferring Intra-tumor Heterogeneity from DNA Sequencing Data" on Monday, August 29, 2022 at 11:00 AM in CS 301 and Zoom.

Location: Zoom link: https://princeton.zoom.us/j/94547797233?pwd=TmFsOXJUWllQSWVseUQ0RnJIeDh1QT09

The members of Matthew’s committee are as follows:

Examiners: Ben Raphael (Adviser), Yuri Pritykin, Mona Singh

Readers: Olga Troyanskaya, Quaid Morris (Memorial Sloan Kettering Cancer Center)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

Cancer is a disease characterized by somatic mutations which accumulate over time and in response to evolutionary pressures. As a result, tumors are composed of multiple distinct clones each characterized by a different set of mutations. This intra-tumor heterogeneity is closely related to negative outcomes such as treatment resistance, relapse, and metastasis.

The somatic mutations that define tumor clones can range greatly in size, from single-nucleotide variants (SNVs) which change only a single base position to copy-number aberrations (CNAs) which can affect thousands of bases up to the whole genome. Researchers use DNA sequencing data to measure these somatic mutations and study intratumor heterogeneity. However, the vast majority of cancer sequencing uses DNA from bulk tumor samples which are mixtures of millions of cells. Thus, the resulting data is a combination of DNA sequences across all tumor and normal cells. This presents challenges for analysis, as it is not immediately apparent from the sequencing reads which somatic mutations characterize individual clones. Recently, single-cell sequencing technologies enable researchers to measure DNA sequencing reads from individual cells. However, these technologies have higher rates of sequencing errors and limited sequencing coverage, so sophisticated algorithms are still needed to recover the tumor clones and their mutations.

In this dissertation, we present three computational methods for inferring tumor clones and their constituent mutations from either bulk or single-cell DNA sequencing data. The first method, CALDER, infers tumor clones and their evolutionary relationships using SNVs from longitudinal bulk DNA sequencing samples. CALDER uses the longitudinal ordering to apply constraints on the clones present in each sample. The second method, SBMClone, infers tumor clones using SNVs from ultra-low coverage single-cell DNA sequencing data using the stochastic block model, a well-studied tool from statistical physics and network science. The third method, HATCHet2, infers clones and their allele-specific CNAs from one or more bulk DNA sequencing samples. HATCHet2 improves upon the state of-the-art with several methodological innovations, including variable-width binning, locality-aware clustering, and a novel statistic for quantifying allelic imbalance which enables the identification of mirrored subclonal CNAs, in which different alleles are amplified in different clones.

Louis Riehl
Graduate Administrator
Computer Science Department, CS213
Princeton University
(609) 258-8014