*This seminar will be held in-person ONLY in Raisler Lounge. The seminar will NOT be recorded.
Computer vision architectures used to be built on a sparse sample of points in the 80s and 90s. In the 2000s, dense models started to become popular for visual recognition as heuristically defined sparse models do not cover all the important parts of an image. However, with deep learning and end-to-end training approaches, this does not have to continue and sparse models may still have significant advantages in saving unnecessary computation as well as being more flexible. In this talk, I will talk about the deep point cloud convolutional backbones that we have developed in the past few years, including the most recent work PointConvFormer that outperforms grid-based convolutional approaches. As applications of those point-based networks, I will talk about two recent works, including AutoFocusFormer, that uses point cloud backbones and decoders to work on 2D image recognition, with a novel adaptive downsampling module that enables the end-to-end learning of adaptive downsampling. This is very helpful for detecting tiny objects faraway in the scene which would have been decimated by conventional grid downsampling approaches. Finally, I will illustrate the use of point convolution backbones in generative models with a recent work in diverse point cloud completion.