Ankit Goyal will present his FPO "Towards Geometric Intelligence" on Monday, August 15, 2022 at 10:00 AM in Friend Center 202 and Zoom.

Location: Zoom link: https://princeton.zoom.us/my/agoyal

The members of Ankit’s committee are as follows:

Examiners: Jia Deng (Adviser), Szymon Rusinkiewicz, Danqi Chen

Readers: Karthik Narasimhan, Vladlen Koltun (Apple)

A copy of his thesis is available upon request. Please email gradinfo@cs.princeton.edu if you would like a copy of the thesis.

Everyone is invited to attend his talk.

Abstract follows below:

Geometric intelligence is the aspect of human intelligence that relates to perceiving, communicating and reasoning about geometries. Artificial agents like robots must possess geometric intelligence to operate in unconstrained human environments and collaborate with humans in day-to-day life. Geometric intelligence is an umbrella term that encompasses many abilities. In this thesis, we focus on three crucial abilities – first, the ability to see or perceive geometry; second, the ability to communicate about geometry and space and third, the ability to reason and plan about geometries.

For studying the ability to perceive geometry, we pursue two efforts. In one effort, we build a system, called IFOR that recognizes the geometric difference between two scenes and rearrange one scene into another. IFOR can handle unseen objects and transfer to the real world while being trained only on synthetic data. In another effort, we revisit the literature on perceiving objects from point clouds and uncover two surprising results. First, we show that auxiliary factors that are independent of network architecture explain most of the performance improvement. Second, we show that a simple view-based baseline outperforms sophisticated state-of-the-art methods.

For studying the ability to communicate about geometry, we focus on recognizing spatial relations, which are the atomic elements used to communicate about geometric arrangements. We find that existing datasets are insufficient as they lack largescale, high-quality 3D ground truth information, which is critical for learning spatial relations. We fill this gap by constructing Rel3D.

Finally, for studying the ability to reason about geometry, we pursue two efforts. In one effort, we study the problem of geometric reasoning in the context of question-answering. We introduce Dynamic Spatial Memory Network (DSMN), a new deep network architecture designed for answering questions that admit latent visual representations. In another effort, we explore the problem of geometric planning which requires simultaneous reasoning about geometries and planning. We find that existing benchmarks are insufficient for the problem. We propose PackIt – a virtual environment for geometric planning. We benchmark various baselines on PackIt and find that learning could be a viable way to gain geometric planning skills.

Louis Riehl
Graduate Administrator
Computer Science Department, CS213
Princeton University
(609) 258-8014