Yanjin Chen will present her MSE talk “Predictive modeling reveals novel sequence features of prime editing efficiency and precision” on Thursday, April 10, 2025 at 1:30pm in LSI 253.

Yanjin Chen will present her MSE talk “ Predictive modeling reveals novel sequence features of prime editing efficiency and precision ” on Thursday, April 10, 2025 at 1:30pm in LSI 253. The members of hercommittee are as follows: Yuri Pritykin (Adviser) and Britt Adamson (reader) All ae welcome to attend. Please see abstract below. Prime editing (PE) is an exciting genome editing technology that enables precise genetic modifications without requiring donor DNA or inducing double-strand breaks. However, its relatively low efficiency limits broader applications. PE efficiency largely depends on the PE guide RNA (pegRNA) design, which consists of a primer binding site (PBS) and a reverse transcriptase template (RTT) that contains the intended edit. Various computational models have been developed to predict pegRNA efficiencies across different target sites and edit types. Most existing models, however, rely on deep learning, making it difficult to interpret the contributions of individual features and their combinatorial effects on editing efficiency. Besides, existing datasets provide limited coverage in terms of the sequence space for each editing length at a single target DNA locus, offering few insights into how the intended edit sequence influences PE efficiency. Recently, in collaboration with Britt Adamson’s lab at Princeton, we acquired new PE screen datasets including 1-25 bps substitutions, insertions, and deletions at two target loci, in the K562 cell line and a variant of this cell line that is mismatch repair (MMR)-deficient. This thesis aims to dissect the influence of sequence attributes of the desired edits and cellular repair pathways on PE outcomes. We introduce generalized linear models with LASSO feature selection to identify important positional sequence features predictive of editing rates and unintended indel (insertion or deletion) rate at each target locus. Separate models were trained on insertion and substitution datasets. Our results demonstrate that linear models achieve performance comparable to that of deep neural networks, suggesting that sequence features alone are sufficient determinants of editing efficiency and precision. Beyond predictive accuracy, linear models offer greater interpretability. Notably, we observe a new factor in both our dataset and previously published datasets: for insertion edits, incorporating cytosine (C) at the first editing position increases editing efficiency, providing a practical guideline for optimizing pegRNA design. Additionally, our model recaptures the C-C mismatch preference in parental cell lines versus in MMR-deficient cell lines. In summary, this work offers novel insights into the sequence composition and cellular repair mechanisms in PE editing efficiency, which can inform researchers to design PE experiments to achieve higher efficiency.
participants (1)
-
CS Grad Department