This was a hybrid event with in-person attendance in Levine 307 and virtual attendance…
Large-scale generative visual models have made content creation as little effort as writing a short text description. However, these models are typically trained on an enormous amount of Internet data, often containing copyrighted material, licensed images, and personal photos. How can we remove these images if creators decide to opt out? How can we properly compensate them if they choose to opt in?
In this talk, I will first describe an efficient method for removing copyrighted materials, artistic styles of living artists, and memorized images from pretrained text-to-image models. I will then discuss our data attribution algorithms for assessing the influence of each training image for a generated sample. Collectively, we aim to enable creators to retain control over the ownership of training images.