Many existing tools allow us to edit the photographs we take, from making an object in a photo pop to visualizing what a spare room might look like in the color mauve. Smoothly controllable (or parametric) edits are ideal as they provide precise control over how shiny an object appears (e.g., a coffee cup) or the exact shade of paint on a wall. However, making these kinds of edits while preserving photorealism typically requires expert-level skill using existing programs. Enabling users to make these kinds of edits while preserving photorealism has remained a difficult problem in computer vision.
Previous approaches like intrinsic image decomposition break down an image into layers representing “fundamental” visual components, such as base color (also known as “albedo”), specularity, and lighting conditions. These decomposed layers can be individually edited and recombined to make a photo-realstic image. The challenge is that there is a great deal of ambiguity in determining these visual components: Does a ball look darker on one side because its color is darker or because it’s being shadowed? Is that a highlight due to a bright light, or is the surface white there? People are usually able to disambiguate these, yet even we are occasionally fooled, making this a hard problem for computers.
Other recent approaches leverage generative text-to-image (T2I) models, which excel at photorealistic image generation, to edit objects in images. However, these approaches struggle to disentangle material and shape information. For example, trying to change the color of a house from blue to yellow may also change its shape. We observe similar issues in StyleDrop, which can generate different appearances but does not preserve object shape between styles. Could we find a way to edit the material appearance of an object while preserving its geometric shape?
In “Alchemist: Parametric Control of Material Properties with Diffusion Models”, published at CVPR 2024, we introduce a technique that harnesses the photorealistic prior of T2I models to give users parametric editing control of specific material properties (roughness, metallic appearance, base color saturation, and transparency) of an object in an image. We demonstrate that our parametric editing model can change an object’s properties while preserving its geometric shape and can even fill in the background behind the object when made transparent.