E-SAM: Training-Free Segment Every Entity Model

ICCV 2025


Weiming Zhang1     Dingwen Xiao1    Lei Chen1,2    Lin WANG3†   
1AI Thrust, HKUST(GZ)      2HKUST      3NTU     
† Corresponding author

E-SAM is a novel training-free framework that exhibits exceptional Entity Segmentation capability. It mitigates the over-segmentation and under-segmentation challenges inherent in AMG by integrating Multi-level Mask Generation (MMG), Entity-level Mask Refinement (EMR), and Under-Segmentation Refinement (USR). The main contributions are as follows:

  • We design E-SAM, a novel framework aimed at enhancing SAM's performance in entity segmentation.
  • We design MMG, EMR, and USR modules to ultimately produce a high-quality entity-level segmentation map.
  • We carry out extensive experiments demonstrate that E-SAM significantly outperforms both SAM and existing Entity Segmentation methods.

Entity Segmentation in Open-World Scenarios

Abstract

Entity Segmentation (ES) aims at identifying and segmenting distinct entities within an image without the need for predefined class labels. This characteristic makes ES well-suited to open-world applications with adaptation to diverse and dynamically changing environments, where new and previously unseen entities may appear frequently. Existing ES methods either require large annotated datasets or high training costs, limiting their scalability and adaptability. Recently, the Segment Anything Model (SAM), especially in its Automatic Mask Generation (AMG) mode, has shown potential for holistic image segmentation. However, it struggles with over-segmentation and under-segmentation, making it less effective for ES. In this paper, we introduce E-SAM, a novel training-free framework that exhibits exceptional ES capability. Specifically, we first propose Multi-level Mask Generation (MMG) that hierarchically processes SAM's AMG outputs to generate reliable object-level masks while preserving fine details at other levels. Entity-level Mask Refinement (EMR) then refines these object-level masks into accurate entity-level masks. That is, it separates overlapping masks to address the redundancy issues inherent in SAM's outputs and merges similar masks by evaluating entity-level consistency. Lastly, Under-Segmentation Refinement (USR) addresses under-segmentation by generating additional high-confidence masks fused with EMR outputs to produce the final ES map. These three modules are seamlessly optimized to achieve the best ES without additional training overhead.

pipeline

Comparison with Entity Segmentation methods

The quantitative comparison with other monocular Entity Segmentation methods is shown below.

pipeline

The quantitative comparison of various entity segmentation approaches with different backbones is shown below.

pipeline

Acknowledgements

This project is is supported by ......

BibTex

@article{zhang2025sam,
  title={E-SAM: Training-Free Segment Every Entity Model},
  author={Zhang, Weiming and Xiao, Dingwen and Chen, Lei and Wang, Lin},
  journal={arXiv preprint arXiv:2503.12094},
  year={2025}
}