Paul Ellenbogen will present his general exam on Wednesday, May 18, 2016 at 10:15am in CS 302. The members of his committee are Arvind Narayanan (adviser), Barbara Engelhardt, and Prateek Mittal. Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so. His abstract and reading list follow below. ABSTRACT: DNA fingerprinting is the practice of mapping unidentified genetic material to its owner. Currently deployed DNA fingerprinting techniques utilize only a small amount of the information available in human genome, and leverage none of the existing genealogical information. The only other work that has attempted to use genealogical information is limited to inferring the surnames of males, using the heuristic that the surname is correlated with the Y chromosome. We present the first general algorithm for DNA fingerprinting using a small set (~1% of the population) of known genomes in combination with comprehensive genealogical information. With genealogical data and recombination data from the HapMap project, we can repeatedly simulate a pair of individuals’ genomes, allowing us to learn the distribution of shared genetic material between the pair. Using strong independence assumptions and the simulation-generated distributions, we can estimate the joint distribution of shared genetic material for any individual’s genome compared against a set of individuals with known genomes. Unknown genomes can then be matched with their owner by calculating the amount of shared genetic material with each of the known genomes, and picking the individual whose joint distribution predicts the highest probability for that amount of shared genetic material. Thus, we are able to identify individuals using a relatively small number of genomes. This has privacy implications, as it shows that an individual releasing their genetic data may compromise the privacy of their relatives. READING LIST: Book: Hartl, Daniel L., and A. G. Clark. "Principles of Population Genetics (Sinauer, Sunderland, MA)." *the text* (2007). Chapters 2.1, 2.2, 2.5-2.7, 6, 9.2, 1 Articles: 1. Malin, Bradley. "Re-identification of familial database records." *AMIA*. 2006. 2. Homer, Nils, et al. "Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays." *PLoS Genet* 4.8 (2008): e1000167. 3. Joh, Elizabeth E. "DNA Theft: Recognizing the Crime on Nonconsensual Genetic Collection and Testing." *BUL Rev.* 91 (2011): 665. 4. Durand, Eric Y., Nicholas Eriksson, and Cory Y. McLean. "Reducing pervasive false-positive identical-by-descent segments detected by large-scale pedigree analysis." *Molecular biology and evolution* (2014): msu151. 5. Humbert, Mathias, et al. "De-anonymizing Genomic Databases Using Phenotypic Traits." *Proceedings on Privacy Enhancing Technologies* 2015.2 (2015): 99-114. 6. Manichaikul, Ani, et al. "Robust relationship inference in genome-wide association studies." *Bioinformatics* 26.22 (2010): 2867-2873. 7. Palamara, Pier Francesco, et al. "Length distributions of identity by descent reveal fine-scale demographic history." *The American Journal of Human Genetics* 91.5 (2012): 809-822. 8. Joh, Elizabeth E. "Reclaiming'Abandoned'DNA: The Fourth Amendment and Genetic Privacy." *Northwestern University Law Review* 100 (2006): 857. 9. Gymrek, Melissa, et al. "Identifying personal genomes by surname inference." *Science* 339.6117 (2013): 321-324. 10. Humbert, Mathias, et al. "Addressing the concerns of the lacks family: quantification of kin genomic privacy." *Proceedings of the 2013 ACM SIGSAC conference on Computer & communications security*. ACM, 2013. 11. Sweeney, Latanya, Akua Abu, and Julia Winn. "Identifying participants in the personal genome project by name." *Available at SSRN 2257732* (2013).