Meta 3D AssetGen: Transforming 3D Modeling with AI

In an article recently posted to the Meta Research website, researchers introduced "Meta 3D AssetGen," a system for generating high-quality three-dimensional (3D) meshes with physically-based rendering (PBR) materials from text prompts or images. The goal was to revolutionize 3D modeling by enabling a fast and robust generation of meshes with controllable, high-quality PBR materials.

Study: Meta 3D AssetGen: Transforming 3D Modeling with AI. Image Credit: ME Image/Shutterstock.com
Study: Meta 3D AssetGen: Transforming 3D Modeling with AI. Image Credit: ME Image/Shutterstock.com

Background

Generating 3D objects/bodies from text prompts/instructions or images has the potential for 3D animation, graphics, augmented reality/virtual reality (AR/VR), and gaming applications. However, despite the progress in video and image generation, the quality of generators is still lacking for professional use. Common issues include slow generation speed, defects in the 3D meshes and textures, unrealistic material properties, and lighting effects.

Physically-based rendering (PBR) is a technique that simulates the interaction of light with objects using realistic material parameters like albedo (unshaded), metalness, and roughness. PBR allows realistic relighting of 3D objects in different environments, which is important for 3D computer graphics. However, most existing 3D generators output 3D objects with baked illumination, either view-dependent or view-independent, which ignores the model’s response to environmental illumination, resulting in visually unattractive outputs.

About the Research

In this paper, the authors proposed Meta 3D AssetGen, a significant advancement in text-to-3D and sparse-view reconstruction, which produces high-quality meshes with texture and material control. This method employs a two-stage design inspired by a previous technique called Instant3D. The first stage, text-to-image, takes text as input and generates a 4-view grid of images with 6 channels: 3 for the shaded appearance and 3 for the albedo versions.

The second stage, image-to-3D, uses a PBR-based sparse-view reconstruction model (MetaILRM) and a texture refiner. MetaILRM reconstructs the 3D appearance and shape from the views, using a signed-distance function (SDF) for reliable 3D shape representation and deferred shading loss for efficient supervision. The texture refiner enhances the extracted albedo and materials by fusing information from the original views.

The researchers trained and evaluated their method on an internal dataset of 140,000 meshes of diverse semantic categories created by 3D artists and a subset of 10,000 high-quality 3D samples captioned by a Cap3D-like pipeline. Additionally, they used a subset of 332 meshes from Google Scanned Objects for sparse-view reconstruction and an internal dataset of 256 artist-created 3D meshes for PBR reconstruction. Furthermore, they compared their method with state-of-the-art methods from academia and industry with comparable inference times and conducted extensive user studies to assess the visual quality and text alignment of the generated meshes.

Research Findings

The authors demonstrated the effectiveness of Meta 3D AssetGen on text-to-3D and image-to-3D tasks through extensive experiments and user studies. For image-to-3D, they achieved excellent performance among existing mesh-reconstruction methods. They measured the accuracy of the recovered shaded and PBR texture maps using peak signal-to-noise ratio (PSNR) and learned perceptual image patch similarity (LPIPS) metrics. For text-to-3D, they outperformed the methods from industry and academia in terms of visual quality and alignment between the prompt and the generated meshes, with comparable inference times.

In a large-scale user study using 404 deduplicated text prompts from DreamFusion, the researchers collected 11,080 responses. They achieved a human preference of 72% over the best industry competitors of similar speed, including those supporting PBR. They also showed that their method can generate diverse and faithful results for open-vocabulary prompts. It supports fine-grained material control and realistic relighting using the Disney GGX model for the bidirectional reflectance distribution function (BRDF) and deferred shading loss for efficient supervision.

Applications

Meta 3D AssetGen has various implications in 3D graphics, animation, gaming, and AR/VR. It can generate high-quality and realistic 3D meshes with PBR materials from text prompts or images in under 30 seconds, much faster and more robust than existing methods that use baked lighting or opacity fields. It enables creative and interactive content creation, such as writing and improving essays, writing code, or creating presentations based on web content. It also supports fine-grained material control and relighting of the generated 3D assets, making them suitable for different environments and applications.

Conclusion

In summary, Meta 3D AssetGen proved effective for generating 3D assets with controllable PBR materials from text prompts and images. The system created assets with high-quality meshes, textures, and PBR materials through several key innovations. These innovations include generating multi-view grids with both shaded and albedo channels. It also introduced a new reconstruction network for predicting PBR materials and used deferred shading for training this network.

The system improved geometry with a scalable SDF-based renderer and SDF loss. Additionally, it introduced a new texture refinement network. Moving forward, the researchers suggested extending the model to handle more complex scenes with multiple objects, backgrounds, and occlusions. They also recommended incorporating more advanced PBR models and lighting effects.

Journal reference:
Muhammad Osama

Written by

Muhammad Osama

Muhammad Osama is a full-time data analytics consultant and freelance technical writer based in Delhi, India. He specializes in transforming complex technical concepts into accessible content. He has a Bachelor of Technology in Mechanical Engineering with specialization in AI & Robotics from Galgotias University, India, and he has extensive experience in technical content writing, data science and analytics, and artificial intelligence.

Citations

Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Osama, Muhammad. (2024, July 19). Meta 3D AssetGen: Transforming 3D Modeling with AI. AZoAi. Retrieved on November 21, 2024 from https://www.azoai.com/news/20240719/Meta-3D-AssetGen-Transforming-3D-Modeling-with-AI.aspx.

  • MLA

    Osama, Muhammad. "Meta 3D AssetGen: Transforming 3D Modeling with AI". AZoAi. 21 November 2024. <https://www.azoai.com/news/20240719/Meta-3D-AssetGen-Transforming-3D-Modeling-with-AI.aspx>.

  • Chicago

    Osama, Muhammad. "Meta 3D AssetGen: Transforming 3D Modeling with AI". AZoAi. https://www.azoai.com/news/20240719/Meta-3D-AssetGen-Transforming-3D-Modeling-with-AI.aspx. (accessed November 21, 2024).

  • Harvard

    Osama, Muhammad. 2024. Meta 3D AssetGen: Transforming 3D Modeling with AI. AZoAi, viewed 21 November 2024, https://www.azoai.com/news/20240719/Meta-3D-AssetGen-Transforming-3D-Modeling-with-AI.aspx.

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI Framework Transforms Scene Representation with Precise, Editable 3D and 4D Visuals