ailia Tech BLOG

Grounded-SAM: Segmented Any Object from Text

This is an introduction to「Grounded-SAM」, a machine learning model that can be used with ailia SDK. You can easily use this model to create AI applications using ailia SDK as well as many other ready-to-use ailia MODELS.

Overview

Grounded-SAM is a machine learning model capable of segmenting any object specified by text.

Source: https://github.com/IDEA-Research/Grounded-Segment-Anything/blob/main/assets/demo2.jpg

GitHub — IDEA-Research/Grounded-Segment-Anything: Grounded SAM: Marrying Grounding DINO with…Grounded SAM: Marrying Grounding DINO with Segment Anything & Stable Diffusion & Recognize Anything …github.com

Grounded SAM: Assembling Open-World Models for Diverse Visual TasksWe introduce Grounded SAM, which uses Grounding DINO as an open-set object detector to combine with the segment…arxiv.org

Architecture

Grounded-SAM uses GroundingDINOto calculate the bounding box of the object being specified by text, and then uses that bounding box as input to the Segment Anything model to perform segmentation.

Grounded SAM architecture ( Source: https://arxiv.org/abs/2401.14159)

As an application example, by combining Grounded-SAM with Stable Diffusion, it becomes possible to perform advanced image editing, as we can see in the image above. On the 3rd row, the user specifies “bench” by text, which gets segmented, then Stable Diffusion changes its appearance seamlessly.

Grounded-SAM can segment objects based on text, even complex statements such as “a person wearing pink clothes” or “a man wearing sunglasses”

(Source: https://arxiv.org/abs/2401.14159)

You can refer to the following articles to get more information about the models used internally.

Grounding DINO: Detect Any Object from TextThis is an introduction to「Grounding DINO」, a machine learning model that can be used with ailia SDK. You can easily…medium.com

SegmentAnything: A Segmentation Model with Target SpecificationThis is an introduction to「SegmentAnything」, a machine learning model that can be used with ailia SDK. You can easily…medium.com

Generating High-Quality Images with SDXLThis article explains how to generate high-quality images using SDXL, the latest model of Stable Diffusion.medium.com

Usage

To use Grounded-SAM with ailia SDK, use the following command. The memory consumption is approximately 5GB. If your VRAM is limited, add the -e 1 option to execute it on the CPU.

$ python3 grounded_sam.py -i demo.jpg --caption "The running dog."

ailia-models/image_segmentation/grounded_sam at master · ailia-ai/ailia-modelsThe collection of pre-trained, state-of-the-art AI models for ailia SDK - ailia-models/image_segmentation/grounded_sam…github.com

To run Grounded SAM, you’ll need ailia_tokenizer for the BERT Tokenizer. Please install it using the following command.

pip3 install ailia_tokenizer

ailia Inc. has developed ailia SDK, which enables cross-platform, GPU-based rapid inference.

ailia Inc. provides a wide range of services from consulting and model creation, to the development of AI-based applications and SDKs. Feel free to contact us for any inquiry.