[talks] Paul Ellenbogen will present his general exam on Wednesday, May 18, 2016 at 10:15am in CS 302.

Nicki Gotsis ngotsis at CS.Princeton.EDU
Wed May 11 15:36:48 EDT 2016


Paul Ellenbogen will present his general exam on Wednesday, May 18, 2016 at 10:15am in CS 302.

The members of his committee are Arvind Narayanan (adviser), Barbara Engelhardt, and Prateek Mittal.

Everyone is invited to attend his talk, and those faculty wishing to remain for the oral exam following are welcome to do so.  His abstract and reading list follow below.

ABSTRACT:

DNA fingerprinting is the practice of mapping unidentified genetic material
to its owner. Currently deployed DNA fingerprinting techniques utilize only
a small amount of the information available in human genome, and leverage
none of the existing genealogical information. The only other work that has
attempted to use genealogical information is limited to inferring the
surnames of males, using the heuristic that the surname is correlated with
the Y chromosome. We present the first general algorithm for DNA
fingerprinting using a small set (~1% of the population) of known genomes
in combination with comprehensive genealogical information. With
genealogical data and recombination data from the HapMap project, we can
repeatedly simulate a pair of individuals’ genomes, allowing us to learn
the distribution of shared genetic material between the pair. Using strong
independence assumptions and the simulation-generated distributions, we can
estimate the joint distribution of shared genetic material for any
individual’s genome compared against a set of individuals with known
genomes. Unknown genomes can then be matched with their owner by
calculating the amount of shared genetic material with each of the known
genomes, and picking the individual whose joint distribution predicts the
highest probability for that amount of shared genetic material. Thus, we
are able to identify individuals using a relatively small number of
genomes. This has privacy implications, as it shows that an individual
releasing their genetic data may compromise the privacy of their relatives.

READING LIST:

Book:

Hartl, Daniel L., and A. G. Clark. "Principles of Population Genetics
(Sinauer, Sunderland, MA)." *the text* (2007). Chapters 2.1, 2.2, 2.5-2.7,
6, 9.2, 1


Articles:


   1. Malin, Bradley. "Re-identification of familial database records."
   *AMIA*. 2006.
   2. Homer, Nils, et al. "Resolving individuals contributing trace amounts
   of DNA to highly complex mixtures using high-density SNP genotyping
   microarrays." *PLoS Genet* 4.8 (2008): e1000167.
   3. Joh, Elizabeth E. "DNA Theft: Recognizing the Crime on Nonconsensual
   Genetic Collection and Testing." *BUL Rev.* 91 (2011): 665.
   4. Durand, Eric Y., Nicholas Eriksson, and Cory Y. McLean. "Reducing
   pervasive false-positive identical-by-descent segments detected by
   large-scale pedigree analysis." *Molecular biology and evolution*
   (2014): msu151.
   5. Humbert, Mathias, et al. "De-anonymizing Genomic Databases Using
   Phenotypic Traits." *Proceedings on Privacy Enhancing Technologies*
   2015.2 (2015): 99-114.
   6. Manichaikul, Ani, et al. "Robust relationship inference in
   genome-wide association studies." *Bioinformatics* 26.22 (2010):
   2867-2873.
   7. Palamara, Pier Francesco, et al. "Length distributions of identity by
   descent reveal fine-scale demographic history." *The American Journal of
   Human Genetics* 91.5 (2012): 809-822.
   8. Joh, Elizabeth E. "Reclaiming'Abandoned'DNA: The Fourth Amendment and
   Genetic Privacy." *Northwestern University Law Review* 100 (2006): 857.
   9. Gymrek, Melissa, et al. "Identifying personal genomes by surname
   inference." *Science* 339.6117 (2013): 321-324.
   10. Humbert, Mathias, et al. "Addressing the concerns of the lacks
   family: quantification of kin genomic privacy." *Proceedings of the 2013
   ACM SIGSAC conference on Computer & communications security*. ACM, 2013.
   11. Sweeney, Latanya, Akua Abu, and Julia Winn. "Identifying
   participants in the personal genome project by name." *Available at SSRN
   2257732* (2013).


More information about the talks mailing list