[talks] Profiling big data systems at google- Talk today at 3:30pm

Nicole E. Wagenblast nwagenbl at CS.Princeton.EDU
Wed May 1 11:52:26 EDT 2013


Profiling Latency in Deployed Distributed Systems 
Gideon Mann , Google 
Wednesday, May 1, 2013, 3:30pm 
Computer Science 105 


Understanding the sources of latency within a deployed distributed system is complicated. Asynchronous control flow, variable workloads, pushes of new backend servers, and unreliable hardware all can make significant contribution to a job's performance. In this talk, I'll present the work of the Weatherman effort to build a profiling tool for deployed distributed systems. The method uses distributed traces to estimate the code control flow and predict/explain observed performance. I'll then illustrate how this method has been applied to understand and tune large distributed systems at Google and how it has been used in a differential profiling fashion to understand the sources of latency changes. 


To provide another view of latency, I'll quickly discuss our recent work on distributed convex optimization with an emphasis on the interface between the algorithm and the computing substrate performing the computation. In particular, I'll show that data center architecture, in particular network architecture, should have a significant impact on machine learning algorithm design. 



Gideon is a Staff Research Scientist at Google NY. He attended Brown University as an undergraduate where he hung out in the AI lab and drank too much Mountain Dew. He then attended graduate school at Johns Hopkins University, worked in CLSP, and graduated in 2006 with a Ph.D. He still misses Charm City. He then spent a post-doc at the UMass/Amherst with Andrew McCallum working on weakly-supervised learning. In 2007, he joined Google. 


At Google, his team works on applied machine learning. The Weatherman effort leverages statistical methods to data center management. The team also is responsible for the Prediction API (https://developers.google.com/prediction/). Publicly released in 2010, Prediction was an early machine learning as a service offering and remains an ongoing research project. 


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.cs.princeton.edu/pipermail/talks/attachments/20130501/c0559412/attachment.htm>


More information about the talks mailing list