This will be a hybrid event with in-person attendance in Levine 307 and virtual attendance on Zoom.
ABSTRACT
What do diffusion models/flow matching, Gaussian splatting and efficient transformer architectures have in common? Under the hood, they all turn a discrete set of points into a function defined everywhere. In the case of diffusion models/flow matching, the points are training data points, and the function is the probability density. In the case of Gaussian splatting, the points are splat centres and the function is the volume density. In the case of efficient transformers, the points are keys used by attention and the function is the mapping from query to attention weights.
It turns out that *how* gaps between points are filled in is critical — in this talk, I will show how seemingly innocent choices made in popular techniques give rise to profound consequences. Such choices make diffusion models/flow matching data-hungry and slow to sample from, Gaussian splats hard to move and edit, and hashing-based efficient transformers error-prone. To address these issues, I will give an overview of three methods my lab developed, Implicit Maximum Likelihood Estimation (IMLE), Proximity Attention Point Rendering (PAPR) and IceFormer, and show applications in few-shot image synthesis, trajectory prediction, visuomotor policy learning, novel view synthesis, 3D shape and albedo editing, scene interpolation and language modelling.