CS Colloquium: Shuran Song, Monday, Nov 7 at 12:30pm
CS Colloquium Series Speaker: Shuran Song, Columbia University Date: Monday, November 7 Time: 12:30pm Location: CS 105 Host: Szymon Rusinkiewicz Event page: [ https://www.cs.princeton.edu/events/26279 | https://www.cs.princeton.edu/events/26279 ] Title: Unpatched Vulnerabilities in Cellular Standards Abstract: Despite the incredible capabilities (speed, repeatability) of our hardware, most robot manipulators today are deliberately programmed to avoid dynamics – moving slow enough so they can adhere to quasi-static assumptions about the world. In contrast, people frequently (and subconsciously) make use of dynamic phenomena to manipulate everyday objects – from unfurling blankets to tossing trash, to improve efficiency and physical reach range. These abilities are made possible by an intuition of physics, a cornerstone of intelligence. How do we impart the same on robots? In this talk, I will discuss how we might enable robots to leverage dynamics for manipulation in unstructured environments. Modeling the complex dynamics of unseen objects from pixels is challenging. However, by tightly integrating perception and action, we show it is possible to relax the need for accurate dynamical models. Thereby, allowing robots to (i) learn dynamic skills for complex objects, (ii) adapt to new scenarios using visual feedback, and (iii) use their dynamic interactions to improve their understanding of the world. By changing the way we think about dynamics – from avoiding it to embracing it – we can simplify a number of classically challenging problems, leading to new robot capabilities. Bio: Shuran Song is an Assistant Professor in the Department of Computer Science at Columbia University. Before that, she received her Ph.D. in Computer Science at Princeton University, BEng. at HKUST. Her research interests lie at the intersection of computer vision and robotics. Song’s research has been recognized through several awards including the Best Paper Awards at RSS’22 and T-RO’20, Best System Paper Awards at CoRL’21, RSS’19, and Amazon Robotics’18, and finalist at RSS’22, ICRA'20, CVPR'19, RSS’19, and IROS'18. She is also a recipient of the NSF Career Award, as well as research awards from Microsoft, Toyota Research, Google, Amazon, JP Morgan, and Sloan Foundation. To learn more about Shuran’s work please visit: [ https://www.cs.columbia.edu/~shurans/ | https://www.cs.columbia.edu/~shurans/ ]
CS Colloquium Series Speaker: Shuran Song, Columbia University Date: Monday, November 7 Time: 12:30pm Location: CS 105 Host: Szymon Rusinkiewicz Event page: [ https://www.cs.princeton.edu/events/26279 | https://www.cs.princeton.edu/events/26279 ] Title: Learning Meets Gravity: Robots that Embrace Dynamics from Pixels Abstract: Despite the incredible capabilities (speed, repeatability) of our hardware, most robot manipulators today are deliberately programmed to avoid dynamics – moving slow enough so they can adhere to quasi-static assumptions about the world. In contrast, people frequently (and subconsciously) make use of dynamic phenomena to manipulate everyday objects – from unfurling blankets to tossing trash, to improve efficiency and physical reach range. These abilities are made possible by an intuition of physics, a cornerstone of intelligence. How do we impart the same on robots? In this talk, I will discuss how we might enable robots to leverage dynamics for manipulation in unstructured environments. Modeling the complex dynamics of unseen objects from pixels is challenging. However, by tightly integrating perception and action, we show it is possible to relax the need for accurate dynamical models. Thereby, allowing robots to (i) learn dynamic skills for complex objects, (ii) adapt to new scenarios using visual feedback, and (iii) use their dynamic interactions to improve their understanding of the world. By changing the way we think about dynamics – from avoiding it to embracing it – we can simplify a number of classically challenging problems, leading to new robot capabilities. Bio: Shuran Song is an Assistant Professor in the Department of Computer Science at Columbia University. Before that, she received her Ph.D. in Computer Science at Princeton University, BEng. at HKUST. Her research interests lie at the intersection of computer vision and robotics. Song’s research has been recognized through several awards including the Best Paper Awards at RSS’22 and T-RO’20, Best System Paper Awards at CoRL’21, RSS’19, and Amazon Robotics’18, and finalist at RSS’22, ICRA'20, CVPR'19, RSS’19, and IROS'18. She is also a recipient of the NSF Career Award, as well as research awards from Microsoft, Toyota Research, Google, Amazon, JP Morgan, and Sloan Foundation. To learn more about Shuran’s work please visit: [ https://www.cs.columbia.edu/~shurans/ | https://www.cs.columbia.edu/~shurans/ ]
CS Colloquium Series Speaker: John Ingraham, Generate Biomedicines Date: Thursday, November 10 Time: 12:30pm Location: CS 105 Host: Ellen Zhong Event page: https://www.cs.princeton.edu/events/26267 Title: Illuminating protein space with a programmable generative model Abstract: Three billion years of evolution have produced a tremendous diversity of protein molecules, but it is yet unknown how thoroughly evolution has sampled the space of possible protein folds and functions. Here, by introducing a new, scalable generative prior for proteins and protein complexes, we provide further evidence that earth's extant molecular biodiversity represents only a small fraction of what is possible for polypeptides. To enable this, we introduce customized neural networks that enable long-range reasoning, that respect the statistical structures of polymer ensembles, and that can efficiently realize 3D structures of proteins from predicted geometries. We show how this framework broadly enables protein design under auxiliary constraints, which can be any composition of semantics, substructure, symmetries, shape, and even natural language prompts. Bio: John Ingraham is the Head of Machine Learning at Generate Biomedicines, Inc, where he leads a team of scientists and engineers developing new kinds of machine learning systems for protein design. He has spent most of his career developing structured statistical models of the rich diversity found in protein sequences and structures, including as a postdoc at MIT CSAIL with Tommi Jaakkola and Regina Barzilay working on some of the first generative models for structure-based sequence design and before that in his PhD with Debora Marks at Harvard Medical School developing deep learning and statistical-physics inspired models of deep evolutionary sequence variation and protein folding.
CS Colloquium Series Speaker: John Ingraham, Generate Biomedicines Date: Thursday, November 10 Time: 12:30pm Location: CS 105 Host: Ellen Zhong Event page: https://www.cs.princeton.edu/events/26267 Title: Illuminating protein space with a programmable generative model Abstract: Three billion years of evolution have produced a tremendous diversity of protein molecules, but it is yet unknown how thoroughly evolution has sampled the space of possible protein folds and functions. Here, by introducing a new, scalable generative prior for proteins and protein complexes, we provide further evidence that earth's extant molecular biodiversity represents only a small fraction of what is possible for polypeptides. To enable this, we introduce customized neural networks that enable long-range reasoning, that respect the statistical structures of polymer ensembles, and that can efficiently realize 3D structures of proteins from predicted geometries. We show how this framework broadly enables protein design under auxiliary constraints, which can be any composition of semantics, substructure, symmetries, shape, and even natural language prompts. Bio: John Ingraham is the Head of Machine Learning at Generate Biomedicines, Inc, where he leads a team of scientists and engineers developing new kinds of machine learning systems for protein design. He has spent most of his career developing structured statistical models of the rich diversity found in protein sequences and structures, including as a postdoc at MIT CSAIL with Tommi Jaakkola and Regina Barzilay working on some of the first generative models for structure-based sequence design and before that in his PhD with Debora Marks at Harvard Medical School developing deep learning and statistical-physics inspired models of deep evolutionary sequence variation and protein folding. This talk will not be recorded or live-streamed.
CS Colloquium Series Speaker: Adam Tauman Kalai, Microsoft Research Date: Monday, November 21 Time: 12:30pm Location: CS 105 Host: Elad Hazan Event page: https://www.cs.princeton.edu/events/26277 Title: The Power of Intelligent Language Models Abstract: Recently, large language models have been trained on intelligent languages including natural languages, such as English, and programming languages, such as Python. We will examine several interesting applications of these models. First, they can be used to enumerate human stereotypes and discriminatory biases, suggesting that they must be used carefully. Second, they can be used to generate and solve their own programming puzzles, which can be used in a self-training pipeline to solve increasingly challenging algorithmic programming problems. Third, we illustrate how they can be used to simulate numerous human participants in classic behavioral economic and psychology experiments, such as the ultimatum game, risk aversion, garden path sentences, and the Milgram shock experiment. Finally, we discuss future directions in using these language models to understand intelligent animal communication in connection with ProjectCETI, which aims to understand the communication of sperm whales. Bio: Adam Tauman Kalai is a Senior Principal Researcher at Microsoft Research New England. His research includes work on machine learning, artificial intelligence and algorithms. He received his BA from Harvard and PhD from CMU. He has served as an Assistant Professor at Georgia Tech and the Toyota Technological Institute at Chicago. He has co-chaired AI and crowdsourcing conferences including the Conference on Learning Theory (COLT), the Conference on Human Computation and Crowdsourcing (HCOMP), and New England Machine Learning Day (NEML). His honors include an NSF CAREER award and an Alfred P. Sloan fellowship.
CS Colloquium Series Speaker: Adam Tauman Kalai, Microsoft Research Date: Monday, November 21 Time: 12:30pm Location: CS 105 Host: Elad Hazan Event page: https://www.cs.princeton.edu/events/26277 Title: The Power of Intelligent Language Models Abstract: Recently, large language models have been trained on intelligent languages including natural languages, such as English, and programming languages, such as Python. We will examine several interesting applications of these models. First, they can be used to enumerate human stereotypes and discriminatory biases, suggesting that they must be used carefully. Second, they can be used to generate and solve their own programming puzzles, which can be used in a self-training pipeline to solve increasingly challenging algorithmic programming problems. Third, we illustrate how they can be used to simulate numerous human participants in classic behavioral economic and psychology experiments, such as the ultimatum game, risk aversion, garden path sentences, and the Milgram shock experiment. Finally, we discuss future directions in using these language models to understand intelligent animal communication in connection with ProjectCETI, which aims to understand the communication of sperm whales. Bio: Adam Tauman Kalai is a Senior Principal Researcher at Microsoft Research New England. His research includes work on machine learning, artificial intelligence and algorithms. He received his BA from Harvard and PhD from CMU. He has served as an Assistant Professor at Georgia Tech and the Toyota Technological Institute at Chicago. He has co-chaired AI and crowdsourcing conferences including the Conference on Learning Theory (COLT), the Conference on Human Computation and Crowdsourcing (HCOMP), and New England Machine Learning Day (NEML). His honors include an NSF CAREER award and an Alfred P. Sloan fellowship. This talk will be live-steamed via Zoom webinar here: [ https://princeton.zoom.us/j/94031287434 | https://princeton.zoom.us/j/94031287434 ]
CS Colloquium Series Speaker: Shuran Song, Columbia University Date: Monday, November 7 Time: 12:30pm Location: CS 105 Host: Szymon Rusinkiewicz Event page: [ https://www.cs.princeton.edu/events/26279 | https://www.cs.princeton.edu/events/26279 ] Title: Learning Meets Gravity: Robots that Embrace Dynamics from Pixels Abstract: Despite the incredible capabilities (speed, repeatability) of our hardware, most robot manipulators today are deliberately programmed to avoid dynamics – moving slow enough so they can adhere to quasi-static assumptions about the world. In contrast, people frequently (and subconsciously) make use of dynamic phenomena to manipulate everyday objects – from unfurling blankets to tossing trash, to improve efficiency and physical reach range. These abilities are made possible by an intuition of physics, a cornerstone of intelligence. How do we impart the same on robots? In this talk, I will discuss how we might enable robots to leverage dynamics for manipulation in unstructured environments. Modeling the complex dynamics of unseen objects from pixels is challenging. However, by tightly integrating perception and action, we show it is possible to relax the need for accurate dynamical models. Thereby, allowing robots to (i) learn dynamic skills for complex objects, (ii) adapt to new scenarios using visual feedback, and (iii) use their dynamic interactions to improve their understanding of the world. By changing the way we think about dynamics – from avoiding it to embracing it – we can simplify a number of classically challenging problems, leading to new robot capabilities. Bio: Shuran Song is an Assistant Professor in the Department of Computer Science at Columbia University. Before that, she received her Ph.D. in Computer Science at Princeton University, BEng. at HKUST. Her research interests lie at the intersection of computer vision and robotics. Song’s research has been recognized through several awards including the Best Paper Awards at RSS’22 and T-RO’20, Best System Paper Awards at CoRL’21, RSS’19, and Amazon Robotics’18, and finalist at RSS’22, ICRA'20, CVPR'19, RSS’19, and IROS'18. She is also a recipient of the NSF Career Award, as well as research awards from Microsoft, Toyota Research, Google, Amazon, JP Morgan, and Sloan Foundation. To learn more about Shuran’s work please visit: [ https://www.cs.columbia.edu/~shurans/ | https://www.cs.columbia.edu/~shurans/ ]
CS Colloquium Series Speaker: Pat Hanrahan, Stanford University Date: Monday, November 14 Time: 4:30pm Location: Friend Center, room 101 Host: Adam Finkelstein Event page: https://www.cs.princeton.edu/events/26268 Title: Shading Languages and the Emergence of Programmable Graphics Systems Abstract: A major challenge in using computer graphics for movies and games is to create a rendering system that can create realistic pictures of a virtual world. The system must handle the variety and complexity of the shapes, materials, and lighting that combine to create what we see every day. The images must also be free of artifacts, emulate cameras to create depth of field and motion blur, and compose seamlessly with photographs of live action. Pixar's RenderMan was created for this purpose, and has been widely used in feature film production. A key innovation in the system is to use a shading language to procedurally describe appearance. Shading languages were subsequently extended to run in real-time on graphics processing units (GPUs), and now shading languages are widely used in game engines. The final step was the realization that the GPU is a data-parallel computer, and the the shading language could be extended into a general-purpose data-parallel programming language. This enabled a wide variety of applications in high performance computing, such as physical simulation and machine learning, to be run on GPUs. Nowadays, GPUs are the fastest computers in the world. This talk will review the history of shading languages and GPUs, and discuss the broader implications for computing. Bio: Pat Hanrahan is the Canon Professor of Computer Science and Electrical Engineering in the Computer Graphics Laboratory at Stanford University. His research focuses on rendering algorithms, graphics systems, and visualization. Hanrahan received a Ph.D. in biophysics from the University of Wisconsin-Madison in 1985. As a founding employee at Pixar Animation Studios in the 1980s, Hanrahan led the design of the RenderMan Interface Specification and the RenderMan Shading Language. In 1989, he joined the faculty of Princeton University. In 1995, he moved to Stanford University. More recently, Hanrahan served as a co-founder and CTO of Tableau Software. He has received three Academy Awards for Science and Technology, the SIGGRAPH Computer Graphics Achievement Award, the SIGGRAPH Stephen A. Coons Award, and the IEEE Visualization Career Award. He is a member of the National Academy of Engineering and the American Academy of Arts and Sciences. In 2019, he received the ACM A. M. Turing Award. Speaker: Richard L. Sites Date: Tuesday, November 15 Time: 12:30pm Location: CS 105 Host: Brian Kernighan Event page: [ https://www.cs.princeton.edu/events/26269 | https://www.cs.princeton.edu/events/26269 ] Title: Making the Invisible Visible: Observing Complex Software Dynamics Abstract: From mobile and cloud apps to video games to driverless vehicle control, more and more software is time-constrained: it must deliver reliable results seamlessly, consistently, and virtually instantaneously. If it doesn't, customers are unhappy--and sometimes lives are put at risk. When complex software underperforms or fails, identifying the root causes is difficult and, historically, few tools have been available to help, leaving application developers to guess what might be happening. How can we do better? The key is to have low-overhead observation tools that can show exactly where all the elapsed time goes in both normal responses and in delayed responses. Doing so makes visible each of the seven possible reasons for such delays, as we show. Bio: Richard L. Sites wrote his first computer program in 1959 and has spent most of his career at the boundary between hardware and software, with a particular interest in CPU/software performance interactions. His past work includes VAX microcode, DEC Alpha co-architect, and inventing the performance counters found in nearly all processors today. He has done low-overhead microcode and software tracing at DEC, Adobe, Google, and Tesla. Dr. Sites earned his PhD at Stanford in 1974; he holds 66 patents and is a member of the US National Academy of Engineering. Speaker: Luke Zettlemoyer, University of Washington Date: Thursday, November 17 Time: 12:30pm Location: Friend Center Convocation room Host: Danqi Chen Event page: [ https://www.cs.princeton.edu/events/26270 | https://www.cs.princeton.edu/events/26270 ] Title: Large Language Models: Will they keep getting bigger? And, how will we use them if they do? Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models. This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Suchin Gururangan, Ari Holtzmann, Mike Lewis, Margaret Li, Sewon Min, and Peter West. Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Director at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh.
CS Colloquium Series Speaker: Pat Hanrahan, Stanford University Date: Monday, November 14 Time: 4:30pm Location: Friend Center, room 101 Host: Adam Finkelstein Event page: https://www.cs.princeton.edu/events/26268 Title: Shading Languages and the Emergence of Programmable Graphics Systems Abstract: A major challenge in using computer graphics for movies and games is to create a rendering system that can create realistic pictures of a virtual world. The system must handle the variety and complexity of the shapes, materials, and lighting that combine to create what we see every day. The images must also be free of artifacts, emulate cameras to create depth of field and motion blur, and compose seamlessly with photographs of live action. Pixar's RenderMan was created for this purpose, and has been widely used in feature film production. A key innovation in the system is to use a shading language to procedurally describe appearance. Shading languages were subsequently extended to run in real-time on graphics processing units (GPUs), and now shading languages are widely used in game engines. The final step was the realization that the GPU is a data-parallel computer, and the the shading language could be extended into a general-purpose data-parallel programming language. This enabled a wide variety of applications in high performance computing, such as physical simulation and machine learning, to be run on GPUs. Nowadays, GPUs are the fastest computers in the world. This talk will review the history of shading languages and GPUs, and discuss the broader implications for computing. Bio: Pat Hanrahan is the Canon Professor of Computer Science and Electrical Engineering in the Computer Graphics Laboratory at Stanford University. His research focuses on rendering algorithms, graphics systems, and visualization. Hanrahan received a Ph.D. in biophysics from the University of Wisconsin-Madison in 1985. As a founding employee at Pixar Animation Studios in the 1980s, Hanrahan led the design of the RenderMan Interface Specification and the RenderMan Shading Language. In 1989, he joined the faculty of Princeton University. In 1995, he moved to Stanford University. More recently, Hanrahan served as a co-founder and CTO of Tableau Software. He has received three Academy Awards for Science and Technology, the SIGGRAPH Computer Graphics Achievement Award, the SIGGRAPH Stephen A. Coons Award, and the IEEE Visualization Career Award. He is a member of the National Academy of Engineering and the American Academy of Arts and Sciences. In 2019, he received the ACM A. M. Turing Award. This talk will be live streamed on [ https://mediacentral.princeton.edu/ | Princeton University Media Central ] . See link [ https://mediacentrallive.princeton.edu/ | here ] .
CS Colloquium Series Speaker: Richard L. Sites Date: Tuesday, November 15 Time: 12:30pm Location: CS 105 Host: Brian Kernighan Event page: [ https://www.cs.princeton.edu/events/26269 | https://www.cs.princeton.edu/events/26269 ] Title: Making the Invisible Visible: Observing Complex Software Dynamics Abstract: From mobile and cloud apps to video games to driverless vehicle control, more and more software is time-constrained: it must deliver reliable results seamlessly, consistently, and virtually instantaneously. If it doesn't, customers are unhappy--and sometimes lives are put at risk. When complex software underperforms or fails, identifying the root causes is difficult and, historically, few tools have been available to help, leaving application developers to guess what might be happening. How can we do better? The key is to have low-overhead observation tools that can show exactly where all the elapsed time goes in both normal responses and in delayed responses. Doing so makes visible each of the seven possible reasons for such delays, as we show. Bio: Richard L. Sites wrote his first computer program in 1959 and has spent most of his career at the boundary between hardware and software, with a particular interest in CPU/software performance interactions. His past work includes VAX microcode, DEC Alpha co-architect, and inventing the performance counters found in nearly all processors today. He has done low-overhead microcode and software tracing at DEC, Adobe, Google, and Tesla. Dr. Sites earned his PhD at Stanford in 1974; he holds 66 patents and is a member of the US National Academy of Engineering. This talk will be live streamed on [ https://mediacentral.princeton.edu/ | Princeton University Media Central ] . See link [ https://mediacentrallive.princeton.edu/ | here ] .
CS Colloquium Series Speaker: Luke Zettlemoyer, University of Washington Date: Thursday, November 17 Time: 12:30pm Location: Friend Center Convocation room Host: Danqi Chen Event page: [ https://www.cs.princeton.edu/events/26270 | https://www.cs.princeton.edu/events/26270 ] Title: Large Language Models: Will they keep getting bigger? And, how will we use them if they do? Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models. This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Suchin Gururangan, Ari Holtzmann, Mike Lewis, Margaret Li, Sewon Min, and Peter West. Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Director at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh. This talk will be live streamed on [ https://mediacentral.princeton.edu/ | Princeton University Media Central ] . See link here: [ https://mediacentrallive.princeton.edu/ | https://mediacentrallive.princeton.edu/ ]
**Please note, lunch will be available at 12:00pm in the Friend Center Convocation room before the talk.** CS Colloquium Series Speaker: Luke Zettlemoyer, University of Washington Date: Thursday, November 17 Time: 12:30pm Location: Friend Center Convocation room Host: Danqi Chen Event page: [ https://www.cs.princeton.edu/events/26270 | https://www.cs.princeton.edu/events/26270 ] Title: Large Language Models: Will they keep getting bigger? And, how will we use them if they do? Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models. This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Suchin Gururangan, Ari Holtzmann, Mike Lewis, Margaret Li, Sewon Min, and Peter West. Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Director at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh. This talk will be live streamed on [ https://mediacentral.princeton.edu/ | Princeton University Media Central ] . See link here: [ https://mediacentrallive.princeton.edu/ | https://mediacentrallive.princeton.edu/ ]
CS Colloquium Series Speaker: Luke Zettlemoyer, University of Washington Date: Thursday, November 17 Time: 12:30pm Location: Friend Center Convocation room Host: Danqi Chen Event page: [ https://www.cs.princeton.edu/events/26270 | https://www.cs.princeton.edu/events/26270 ] Title: Large Language Models: Will they keep getting bigger? And, how will we use them if they do? Abstract: The trend of building ever larger language models has dominated much research in NLP over the last few years. In this talk, I will discuss our recent efforts to (at least partially) answer two key questions in this area: Will we be able to keep scaling? And, how will we actually use the models, if we do? I will cover our recent efforts on learning new types of sparse mixtures of experts (MoEs) models. Unlike model-parallel algorithms for learning dense models, which are very difficult to further scale with existing hardware, our sparse approaches have significantly reduced cross-node communication costs and could possibly provide the next big leap in performance, although finding a version that scales well in practice remains an open challenge. I will also present our recent work on prompting language models that better controls for surface form variation, to improve performance of models that are so big we can only afford to do inference, with little to no task-specific fine tuning. Finally, time permitting, I will discuss work on new forms of supervision for language model training, including learning from the hypertext and multi-modal structure of web pages to provide new signals for both learning and prompting the model. Together, these methods present our best guesses for how to keep the scaling trend alive as we move forward to the next generation of NLP models. This talk describes work done at the University of Washington and Meta, primarily led by Armen Aghajanyan, Suchin Gururangan, Ari Holtzmann, Mike Lewis, Margaret Li, Sewon Min, and Peter West. Bio: Luke Zettlemoyer is a Professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington, and a Research Director at Meta. His research focuses on empirical methods for natural language semantics, and involves designing machine learning algorithms, introducing new tasks and datasets, and, most recently, studying how to best develop self-supervision signals for pre-training. His honors include being named an ACL Fellow as well as winning a PECASE award, an Allen Distinguished Investigator award, and multiple best paper awards. Luke received his PhD from MIT and was a postdoc at the University of Edinburgh. This talk will be live streamed on [ https://mediacentral.princeton.edu/ | Princeton University Media Central ] . See link here: [ https://mediacentrallive.princeton.edu/ | https://mediacentrallive.princeton.edu/ ]
participants (1)
-
Emily C. Lawrence