Drag the slider left and right to toggle the CoACD decomposition that handles collisions
We present the first generative scene-synthesis algorithm that guarantees every single sample abides by hard spatial contraints without the use of projections or other post-hoc optimizations that cause unknown levels of distribution shift.
Many popular generative modeling frameworks can be summarized as defining a transformation from a dataset that is intractable to sample from to one that is tractable to sample from, then learning to invert this transformation or parameterize a posterior defined by it, thereby yielding a tractable sampling algorithm. To guarantee samples that always satisfy hard spatial constraints, we expand on this broad construction: if the transformation itself abides by constraints, its final state will too. We decompose complicated geometry (e.g. meshes) into a sets of convex hulls for fast collision detection, define an initial-velocity information preserving iterable simulation to noise scenes, then teach a model to denoise scenes by predicting simulation final velocities given final object positions; by negating these final velocities and feeding them back into the simulation, we provably have zero reconstruction error (controllable floating point errors in practice), yielding our sampling algorithm.
We evaluate our method on the 3D-FRONT dataset, looking at metrics that measure object overlap, scene quality, scene diversity, and adherence to text prompts when sampling conditionally.