EEdit⚡: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

1SJTU, 2HKUST
*Indicates Equal Contribution

Corresponding Author
Teaser image

Gallery of various editing tasks and efficiency comparisons.
We propose the EEdit, an inversion-based framework for efficient image editing. Compare with previous methods, we achieve the faster and more efficient image editing.

Abstract

Inversion-based image editing is rapidly gaining momentum while suffering from significant computation overhead, hindering its application in real-time interactive scenarios. In this paper, we rethink that the redundancy in inversion-based image editing exists in both the spatial and temporal dimensions, such as the unnecessary computation in unedited regions and the redundancy in the inversion progress. To tackle these challenges, we propose a practical framework, named EEdit, to achieve efficient image editing. Specifically, we introduce three techniques to solve them one by one. For spatial redundancy, spatial locality caching is introduced to compute the edited region and its neighboring regions while skipping the unedited regions, and token indexing preprocessing is designed to further accelerate the caching. For temporal redundancy, inversion step skipping is proposed to reuse the latent for efficient editing. Our experiments demonstrate an average of 2.46X acceleration without performance drop in a wide range of editing tasks including prompt-guided image editing, dragging and image composition.

Methods

MY ALT TEXT

The overview of our approach. The proposed framework for image editing based on MM-DiT diffusion models employs an efficient denoising and training-free approach. The pipeline takes the original image and an editing prompt as inputs. Specifically, the cache is refreshed entirely in fixed time-step interval, while partial computation for updating cache is maintained for the intermediate timesteps.

MY ALT TEXT

The pipeline of spatial locality caching.
(1) The initialization and refresh process of cache storage using the computed results from SA (Self-Attention), CA (Cross-Attention), and MLP.
(2) The token-wise partial computation logic and the cache update mechanism.
(3) The initialization and update logic for scoring, which is responsible for selecting indices for partial computation.

Gallery

BibTeX

@misc{yan2025eeditrethinkingspatial,
        title={EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing}, 
        author={Zexuan Yan and Yue Ma and Chang Zou and Wenteng Chen and Qifeng Chen and Linfeng Zhang},
        year={2025},
        eprint={2503.10270},
        archivePrefix={arXiv},
        primaryClass={cs.CV},
        url={https://arxiv.org/abs/2503.10270}, 
  }