TriPlaneNet: An Encoder for EG3D Inversion

WACV 2024
1Technical University of Munich (TUM)


Update (02.11): Code for TriPlaneNet v2 is released.


Update (29.10): The second version of the article has been accepted to WACV 2024 and features additional contributions and better results. Code for TriPlaneNet v2 is coming soon.

Triplanenet inverts an input image into the latent space of 3D GAN for novel view rendering.

Abstract

Recent progress in NeRF-based GANs has introduced a number of approaches for high-resolution and high-fidelity generative modeling of human heads with a possibility for novel view rendering.

At the same time, one must solve an inverse problem to be able to re-render or modify an existing image or video. Despite the success of universal optimization-based methods for 2D GAN inversion, those applied to 3D GANs may fail to extrapolate the result onto the novel view, whereas optimization-based 3D GAN inversion methods are time-consuming and can require at least several minutes per image. Fast encoder-based techniques, such as those developed for StyleGAN, may also be less appealing due to the lack of identity preservation.

Our work introduces a fast technique that bridges the gap between the two approaches by directly utilizing the tri-plane representation presented for the EG3D generative model. In particular, we build upon a feed-forward convolutional encoder for the latent code and extend it with a fully-convolutional predictor of tri-plane numerical offsets. The renderings are similar in quality to the ones produced by optimization-based techniques and outperform the ones by encoder-based methods. As we empirically prove, this is a consequence of directly operating in the tri-plane space, not in the GAN parameter space, while making use of an encoder-based trainable approach. Finally, we demonstrate significantly more correct embedding of a face image in 3D than for all the baselines, further strengthened by a probably symmetric prior enabled during training.

Video

Re-rendering the Input Video

Using Triplanenet, you can re-render in-the-wild videos from a novel view. The framework is capable of representing tiny details of in-the-wild portrait imagery and supports complex facial expressions.

Live Demo

The first version of Triplanenet runs in real-time on a single RTX 3090 GPU.

Method Overview

Inversion is performed in two phases.

In the initial phase, given an input image, an encoder is utilized to predict pivotal latent code and obtain initial an input-view RGB image and a mirror-view RGB image.

In the second phase, the initial input-view reconstruction, the difference between the input image and the input-view reconstructed image, the difference between a mirror image and mirror-view reconstructed image, and the first branch tri-planes features are processed by an auto-encoder to estimate tri-plane offsets. The offsets are numerically added to the tri-planes output by the EG3D generator. The final reconstruction is obtained by processing the refined tri-planes by the renderer block.

Reconstruction and Novel View Comparison

Triplanenet can reconstruct a face in more detail, especially introducing more fidelity for features such as hats, hair, and background.

For novel view rendering, Triplanenet preserves identity and multi-view consistency better compared to other approaches.

The inference time is given for a single RTX A100 Ti GPU.

3D Geometry Comparison

Triplanenet estimates the view-consistent embedding of a head in 3D from a single image.

The inference time is given for a single RTX A100 Ti GPU.

Related Links

For more work on similar tasks, please check out

PTI: Pivotal Tuning for Latent-based editing of Real Images introduces an optimization mechanism for solving the StyleGAN inversion task.

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation presents encoder-based approach to embed input images into W+ space of StyleGAN.

BibTeX

@article{bhattarai2024triplanenet,
                    title={TriPlaneNet: An Encoder for EG3D Inversion},
                    author={Bhattarai, Ananta R. and Nie{\ss}ner, Matthias and Sevastopolsky, Artem},
                    booktitle={IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
                    year={2024}}