Keys to Better Image Inpainting:
Structure and Texture Go Hand in Hand

WACV 2023

Jitesh Jain1,2,3*         Yuqian Zhou4*         Ning Yu5         Humphrey Shi1,3        

1 SHI Lab @ University of Oregon     2 IIT Roorkee    

3 Picsart AI Research     4 Adobe Inc.     5 Salesforce Research    
* Equal Contribution


ArXiv
Paper

         


GitHub
Code

         


HuggingFace
Space

         


Colab
Demo

         


Citation
BibTeX

         

Abstract

Deep image inpainting has made impressive progress with recent advances in image generation and processing algorithms. We claim that the performance of inpainting algorithms can be better judged by the generated structures and textures. Structures refer to the generated object boundary or novel geometric structures within the hole, while texture refers to high-frequency details, especially man-made repeating patterns filled inside the structural regions. We believe that better structures are usually obtained from a coarse-to-fine GAN-based generator network while repeating patterns nowadays can be better modeled using state-of-the-art high-frequency fast fourier convolutional layers. In this paper, we propose a novel inpainting network combining the advantages of the two designs. Therefore, our model achieves a remarkable visual quality to match state-of-the-art performance in both structure generation and repeating texture synthesis using a single network. Extensive experiments demonstrate the effectiveness of the method, and our conclusions further highlight the two critical factors of image inpainting quality, structures, and textures, as the future design directions of inpainting networks.


Method

In this paper, we revisited the core design ideas of stateof-the-art deep inpainting networks. We propose an intuitive and effective inpainting architecture that augments the powerful comodulated StyleGAN2 generator with the high receptiveness ability of FFC to achieve equally good performance on both textures and structures. Specifically, we generate image structures in a coarse-to-fine StyleGAN-based generation scheme. Meanwhile, we merge between the generated coarse features and the skip features from the encoder and pass them through a Fast Fourier Synthesis (FaF-Syn) module to better generate repeating textures. The convolutional layers inside FaF-Syn are co-modulated using the encoded features and style mapping of the latent noise vector. Our idea is simple yet effective, making structures and textures well synthesized within a single network.

Qualitative Results

Our model preserves much better repeating textures compared with CoModGAN. CoModGAN does not have any attention-related modules, so high-frequency features cannot be effectively reused given the limited receptive field. Our model enlarged the receptive field using fast Fourier layers and effectively rendered source textures on newly generated random structures. Meanwhile, ours also outperforms LaMa in generating object boundaries and structures. It is evident that LaMa generates fading-out artifacts when the hole reaches the image or object boundary. LaMa cannot hallucinate good structural information given large holes across longer pixel ranges. Ours, however, leverages the advantages of the coarse-to-fine generator to synthesize a clear shape boundary of objects in a better manner. In conclusion, our model integrates the advantages of two state-of-the-arts and simultaneously generates remarkable structures and textures.

More Scene Completion Results

Texture Completion Results

Face Inpainting Results

While testing on face images, especially when we covered half of the faces, LaMa generates fadingout hairs on the forehead, and CoModGAN may use others' eyes to complete the images. Though they both obtain good numbers in the quantitative results, some drawbacks are reflected, making both models not robust enough. Ours demonstrates a sound synthesis of hair and forehead shape and consistent eye and eyebrow appearance like LaMa. We can keep concluding that our proposed model works consistently well on both image structures and consistent textures.


Citation

If you found the demo useful in your research, please consider starring ⭐ us on GitHub and citing 📚 us in your research!

@inproceedings{jain2022keys,
title={Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand},
author={Jitesh Jain and Yuqian Zhou and Ning Yu and Humphrey Shi},
booktitle={WACV},
year={2023}
}