Reference-based Painterly Inpainting via Diffusion:
Crossing the Wild Reference Domain Gap

1 The University of Texas at Austin,2 SHI Labs @ Georgia Tech & UIUC,3 Picsart AI Research (PAIR)

TL;DR: Our proposed Reference-based Painterly Inpainting framework (RefPaint) allows controlling the strength of reference semantics and background style when performing inpainting. Compared to Stable Diffusion, which uses text prompts as the reference, our RefPaint captures the reference information better and is able to generate consistent styles.

Abstract

Have you ever imagined how it would look if we placed new objects into paintings? For example, what would it look like if we placed a basketball into Claude Monet's ``Water Lilies, Evening Effect''?
We propose Reference-based Painterly Inpainting, a novel task that crosses the wild reference domain gap and implants novel objects into artworks. Although previous works have examined reference-based inpainting, they are not designed for large domain discrepancies between the target and the reference, such as inpainting an artistic image using a photorealistic reference. This paper proposes a novel diffusion framework, dubbed RefPaint, to ``inpaint more wildly'' by taking such references with large domain gaps. Built with an image-conditioned diffusion model, we introduce a ladder-side branch and a masked fusion mechanism to work with the inpainting mask. By decomposing the CLIP image embeddings at inference time, one can manipulate the strength of semantic and style information with ease. Experiments demonstrate that our proposed RefPaint framework produces significantly better results than existing methods. Our method enables creative painterly image inpainting with reference objects that would otherwise be difficult to achieve.

Task Comparison

Method Overview

Given input quadruplets (Ir,Ibg,Mo,MbgI_r, I_{bg}, M_o, M_{bg}) that consists of an object-centric reference image IrRHr×Wr×3I_r \in R^{H_r \times W_r \times 3}, a background image IbgRHbg×Wbg×3I_{bg} \in R^{H_{bg} \times W_{bg} \times 3} and their corresponding binary masks MoRHo×Wr×1M_o \in R^{H_o \times W_r \times 1} and MbgRHbg×Wbg×3M_{bg} \in R^{H_{bg} \times W_{bg} \times 3}, the goal is to inpaint the input object IrI_r into the masked region Ibg×MbgI_{bg} \times M_{bg}. We utilize a ladder-side branch and a masked fusion block to incorporate additional mask information. The framework is trained in a self-supervised manner.

Results

Original

Reference

RefPaint

Stable Diffusion


Visual results for Reference-based Painterly Inpainting framework (RefPaint) results using random inpainting masks and random objects from COCO Captions dataset. Blue bounding box represents the edited regions where we would like to inpaint. Red boundaries indicate the reference object.

Citation

If our work is helpful for you, please cite:

@article{xu2023refpaint,
    author = {},
    title = {Reference-based Painterly Inpainting via Diffusion: Crossing the Wild Reference Domain Gap},
    journal={arxiv preprint},
    year={2023}
}