GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation

Dingdong Yang¹, Yizhi Wang¹, Konrad Schindler², Ali Mahdavi-Amiri¹, Hao Zhang¹

¹Simon Fraser University ²ETH Zurich

OpenReview PDF arXiv Code (Coming Soon)

concept of our method

Given a watertight mesh (a), our representation, GALA, for geometry-aware local adaptive grids, distributes a set of root node voxels (coral) to cover the mesh surfaces. An octree subdivision is applied to each root, with a subset shown in (c). In each non-empty octree leaf node (green), a local grid (red dots) is oriented and anisotropically scaled to adapt to and tightly bound the local surface geometries. Only 277K parameters with 8-bit quantization yields an accurate representation (e).

Abstract

We propose GALA, a novel representation of 3D shapes that (i) excels at capturing and reproducing complex geometry and surface details, (ii) is computationally efficient, and (iii) lends itself to 3D generative modelling with modern, diffusion- based schemes. The key idea of GALA is to exploit both the global sparsity of surfaces within a 3D volume and their local surface properties. Sparsity is promoted by covering only the 3D object boundaries, not empty space, with an ensemble of tree root voxels. Each voxel contains an octree to further limit storage and compute to regions that contain surfaces. Adaptivity is achieved by fitting one local and geometry-aware coordinate frame in each non-empty leaf node. Adjusting the orientation of the local grid, as well as the anisotropic scales of its axes, to the local surface shape greatly increases the amount of detail that can be stored in a given amount of memory, which in turn allows for quantization without loss of quality. With our optimized C++/CUDA implementation, GALA can be fitted to an object in less than 10 seconds. Moreover, the representation can efficiently be flattened and manipulated with transformer networks. We provide a cascaded generation pipeline capable of generating 3D shapes with great geometric detail.

Airplane 0

Airplane 1

Airplane 2

Airplane 3

Airplane 4

Airplane 5

Airplane 6

Airplane 7

Airplane 8

Airplane 9

Airplane 10

Train 0

Train 1

Train 2

Train 3

Train 4

Train 5

Earphone 0

Earphone 1

Earphone 2

Earphone 3

Chair 0

Chair 1

Chair 2

Chair 3

Chair 4

Chair 5

Chair 6

Chair 7

Chair 8

Chair 9

Chair 10

Lamp 0

Lamp 1

Lamp 2

Lamp 3

Lamp 4

Lamp 5

Lamp 6

Lamp 7

Lamp 8

Can 0

Can 1

Can 2

Can 3

Vessel 0

Vessel 1

Vessel 2

Vessel 3

Vessel 4

Vessel 5

Vessel 6

Gallery of examples of detailed and diverse 3D generation results.

Accurate and Efficient Representation

Description of image

As compared qualitatively above, our GALA representation is able to represent distinct geometries accurately. Left plot is a plot of representation precision (in Chamfer Distance) vs. number of parameter. From this plot, we can also have a very intuitive conclusion that GALA not only represents geometry accurately but also with one of the fewest parameter count. Here, the † means the non-quantized version of GALA. Quantized version of GALA in default yields in only ~0.28MB per object GALA storage.

Efficient Implementation of GALA Fitting (~10 seconds)

We measure the time of our pure C++/CUDA implementation of GALA fitting process on 250 ShapeNet Airplane (Top Left), and ShapeNet Chair (Top Right) objects. Under the configuration of 6 virtualized logical cores (hyper-thread) of AMD EPYC 7413 @2.65GHz and 1 Nvidia A100, meansurements yielded in less than 10 seconds statistically. Compared to other per-object fitting method (DMTet, Mosaic-SDF) (Left), our implementation achieves much faster fitting speed even in a lower-end desktop (configured with GTX3060 and Intel i9-12900K).

Cascaded Generation of GALA

Based on our Octree Forests setting of GALA, we are able to naturally construct the generation process in a cascaded fashion. As shown below: (1) Root Voxel Diffusion of vector set \( X_o = \{\mathbf{x_o}\in\mathbb{R}^4|(\mathbf{p}, s)\} \); (2) Grid Config Diffusion of vector set \( X_{\bar{V}} = \{\mathbf{x}_{\bar{V}}\in\mathbb{R}^10|(\mathbf{p}_g, \mathbf{q}, \mathbf{s}_g)\} \); (3) and Grid Value Regression \( X_V = \{ \mathbf{x}_V \in \mathbb{R}^{m^3} \} \). For more on the symbols used here, please refer to Figure. 3 of the paper. Qualitative comparisons with other baselines are also shown below.

Application: Autocompletion

As shown above, given coarse and partial inputs (the root voxels in orange), we firstly autocomplete the rest root voxles (blue) and then generate the geometries. This coarse-to-fine generation pipeline has the potential of openning up a possible way of interactive 3D creation in the future.

Texturing The Generated Mesh

First carving the mesh and then adding textures is a standard work- flow in 3D asset creation within the industry, which allows flexibility of creating various textures while reusing the fine meshes already crafted. Many deep learning works follow this strategy. Here we use Easi-Tex to texture the generated meshes guided by reference images, which in turn indicates the high quality of our generated meshes.

BibTeX


      @article{yang2024gala,
        title={GALA: Geometry-Aware Local Adaptive Grids for Detailed 3D Generation},
        author={Yang, Dingdong and Wang, Yizhi and Schindler, Konrad and Amiri, Ali Mahdavi and Zhang, Hao},
        journal={arXiv preprint arXiv:2410.10037},
        year={2024}
      }